Gym Interface¶

Standard Gymnasium (OpenAI Gym) training loop patterns for reinforcement learning research.

Overview¶

This example demonstrates:

Inspecting observation and action spaces
Sampling random actions
Running multiple episodes
Tracking episode statistics

Key Concepts¶

Observation Space¶

A Dict space containing object states and contact information:

print(env.observation_space)

Action Space¶

A Box space with shape (3,) for (x, y, radius):

env.action_space.sample()  # Returns random (x, y, radius)

Episode Structure¶

Each episode:

reset() - Initialize environment
step(action) - Run full simulation with placed object
Check terminated and info['success']

Note: Unlike most RL environments, step() runs the entire physics simulation. Each episode is a single decision.

Code Example¶

from interphyre import InterphyreEnv

env = InterphyreEnv("two_body_problem", seed=42)

# Inspect spaces
print(f"Observation: {env.observation_space}")
print(f"Action: {env.action_space}")

# Run episodes
for episode in range(5):
    obs, info = env.reset()
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

    print(f"Episode {episode + 1}: "
          f"action=({action[0]:.2f}, {action[1]:.2f}, {action[2]:.2f}) "
          f"reward={reward:+.1f} "
          f"{'SUCCESS' if info['success'] else 'FAIL'}")

env.close()

Running the Example¶

python demos/gym_interface.py

Expected Output¶

Gym Interface Demo: Random actions on multiple levels

Level: two_body_problem
  Observation: Dict(...)
  Action: Box([-5. -5. 0.1], [5. 5. 1.5], (3,), float32)
  Episode 1: action=(3.86, 2.49, 0.59) reward=-0.1 FAIL
  Episode 2: action=(1.17, -1.52, 0.63) reward=-1.0 FAIL
  ...

Results: 0/5 successful
Average reward: -0.46

Total: 0 successes (random actions rarely solve puzzles)

Training Tips¶

Action Sampling Strategies¶

# Uniform random (default)
action = env.action_space.sample()

# Biased toward upper region (objects fall down)
import numpy as np
action = (
    np.random.uniform(-3, 3),      # x: center-ish
    np.random.uniform(2, 4.5),     # y: upper region
    np.random.uniform(0.3, 0.7)    # radius: medium
)

Multiple Levels¶

from interphyre.levels import list_levels

for level_name in list_levels():
    env = InterphyreEnv(level_name, seed=42)
    # Train on each level...

Deterministic Replay¶

# Same seed = same initial state
env.reset(seed=42)
env.step([(0.5, 3.0, 0.6)])  # Always same result for same level