Pistonball¶

This environment is part of the butterfly environments. Please read that page first for general information.

Import	`from pettingzoo.butterfly import pistonball_v6`
Actions	Either
Parallel API	Yes
Manual Control	Yes
Agents	`agents= ['piston_0', 'piston_1', ..., 'piston_19']`
Agents	20
Action Shape	(1,)
Action Values	[-1, 1]
Observation Shape	(457, 120, 3)
Observation Values	(0, 255)
State Shape	(560, 880, 3)
State Values	(0, 255)

This is a physics based cooperative game where the goal is to move the ball to the left-wall of the game border by activating the vertically moving pistons. To achieve an optimal policy for the environment, pistons must learn highly coordinated behavior.

Observations: Each piston-agent’s observation is an RGB image encompassing the piston, its immediate neighbors (either two pistons or a piston and left/right-wall) and the space above them (which may show the ball).

Actions: Every piston can be acted on at each time step. In discrete mode, the action space is 0 to move down by 4 pixels, 1 to stay still, and 2 to move up by 4 pixels. In continuous mode, the value in the range [-1, 1] is proportional to the amount that the pistons are lowered or raised by. Continuous actions are scaled by a factor of 4 to allow for matching the distance travelled in discrete mode, e.g. an action of -1 moves the piston down 4 pixels.

Rewards: The same reward is provided to each agent based on how much the ball moved left in the last time-step (moving right results in a negative reward) plus a constant time-penalty. The distance component is the percentage of the initial total distance (i.e. at game-start) to the left-wall travelled in the past timestep. For example, if the ball began the game 300 pixels away from the wall, began the time-step 180 pixels away and finished the time-step 175 pixels away, the distance reward would be 100 * 5/300 = 1.7. There is also a configurable time-penalty (default: -0.1) added to the distance-based reward at each time-step. For example, if the ball does not move in a time-step, the reward will be -0.1 not 0. This is to incentivize solving the game faster.

Pistonball uses the chipmunk physics engine, so the physics are about as realistic as in the game Angry Birds.

Keys a and d control which piston is selected to move (initially the rightmost piston is selected) and keys w and s move the piston in the vertical direction.

Arguments¶

pistonball_v6.env(n_pistons=20, time_penalty=-0.1, continuous=True,
random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3,
ball_elasticity=1.5, max_cycles=125)

n_pistons: The number of pistons (agents) in the environment.

time_penalty: Amount of reward added to each piston each time step. Higher values mean higher weight towards getting the ball across the screen to terminate the game.

continuous: If true, piston action is a real value between -1 and 1 which is added to the piston height. If False, then action is a discrete value to move a unit up or down.

random_drop: If True, ball will initially spawn in a random x value. If False, ball will always spawn at x=800

random_rotate: If True, ball will spawn with a random angular momentum

ball_mass: Sets the mass of the ball physics object

ball_friction: Sets the friction of the ball physics object

ball_elasticity: Sets the elasticity of the ball physics object

max_cycles: after max_cycles steps all agents will return done

Version History¶

v6: Fix ball bouncing off of left wall.
v5: Ball moving into the left column due to physics engine imprecision no longer gives additional reward
v4: Changed default arguments for max_cycles and continuous, bumped PyMunk version (1.6.0)
v3: Refactor, added number of pistons argument, minor visual changes (1.5.0)
v2: Misc fixes, bumped PyGame and PyMunk version (1.4.0)
v1: Fix to continuous mode (1.0.1)
v0: Initial versions release (1.0.0)

Usage¶

AEC¶

from pettingzoo.butterfly import pistonball_v6

env = pistonball_v6.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        # this is where you would insert your policy
        action = env.action_space(agent).sample()

    env.step(action)
env.close()

Parallel¶

from pettingzoo.butterfly import pistonball_v6

env = pistonball_v6.parallel_env(render_mode="human")
observations, infos = env.reset()

while env.agents:
    # this is where you would insert your policy
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}

    observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()

API¶

class pettingzoo.butterfly.pistonball.pistonball.env(**kwargs)[source]¶

class pettingzoo.butterfly.pistonball.pistonball.raw_env(n_pistons=20, time_penalty=-0.1, continuous=True, random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3, ball_elasticity=1.5, max_cycles=125, render_mode=None)[source]¶

action_space(agent)[source]¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

close()[source]¶

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

observation_space(agent)[source]¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

observe(agent)[source]¶

Returns the observation an agent currently can make.

last() calls this function.

render()[source]¶

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]¶: Resets the environment to a starting state.

state()[source]¶: Returns an observation of the global environment.

step(action)[source]¶

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.