Pistonball¶

This environment is part of the butterfly environments. Please read that page first for general information.
Import |
|
---|---|
Actions |
Either |
Parallel API |
Yes |
Manual Control |
Yes |
Agents |
|
Agents |
20 |
Action Shape |
(1,) |
Action Values |
[-1, 1] |
Observation Shape |
(457, 120, 3) |
Observation Values |
(0, 255) |
State Shape |
(560, 880, 3) |
State Values |
(0, 255) |
This is a physics based cooperative game where the goal is to move the ball to the left-wall of the game border by activating the vertically moving pistons. To achieve an optimal policy for the environment, pistons must learn highly coordinated behavior.
Observations: Each piston-agent’s observation is an RGB image encompassing the piston, its immediate neighbors (either two pistons or a piston and left/right-wall) and the space above them (which may show the ball).
Actions: Every piston can be acted on at each time step. In discrete mode, the action space is 0 to move down by 4 pixels, 1 to stay still, and 2 to move up by 4 pixels. In continuous mode, the value in the range [-1, 1] is proportional to the amount that the pistons are lowered or raised by. Continuous actions are scaled by a factor of 4 to allow for matching the distance travelled in discrete mode, e.g. an action of -1 moves the piston down 4 pixels.
Rewards: The same reward is provided to each agent based on how much the ball moved left in the last time-step (moving right results in a negative reward) plus a constant time-penalty. The distance component is the percentage of the initial total distance (i.e. at game-start) to the left-wall travelled in the past timestep. For example, if the ball began the game 300 pixels away from the wall, began the time-step 180 pixels away and finished the time-step 175 pixels away, the distance reward would be 100 * 5/300 = 1.7. There is also a configurable time-penalty (default: -0.1) added to the distance-based reward at each time-step. For example, if the ball does not move in a time-step, the reward will be -0.1 not 0. This is to incentivize solving the game faster.
Pistonball uses the chipmunk physics engine, so the physics are about as realistic as in the game Angry Birds.
Keys a and d control which piston is selected to move (initially the rightmost piston is selected) and keys w and s move the piston in the vertical direction.
Arguments¶
pistonball_v6.env(n_pistons=20, time_penalty=-0.1, continuous=True,
random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3,
ball_elasticity=1.5, max_cycles=125)
n_pistons
: The number of pistons (agents) in the environment.
time_penalty
: Amount of reward added to each piston each time step. Higher values mean higher weight towards getting the ball across the screen to terminate the game.
continuous
: If true, piston action is a real value between -1 and 1 which is added to the piston height. If False, then action is a discrete value to move a unit up or down.
random_drop
: If True, ball will initially spawn in a random x value. If False, ball will always spawn at x=800
random_rotate
: If True, ball will spawn with a random angular momentum
ball_mass
: Sets the mass of the ball physics object
ball_friction
: Sets the friction of the ball physics object
ball_elasticity
: Sets the elasticity of the ball physics object
max_cycles
: after max_cycles steps all agents will return done
Version History¶
v6: Fix ball bouncing off of left wall.
v5: Ball moving into the left column due to physics engine imprecision no longer gives additional reward
v4: Changed default arguments for
max_cycles
andcontinuous
, bumped PyMunk version (1.6.0)v3: Refactor, added number of pistons argument, minor visual changes (1.5.0)
v2: Misc fixes, bumped PyGame and PyMunk version (1.4.0)
v1: Fix to continuous mode (1.0.1)
v0: Initial versions release (1.0.0)
Usage¶
AEC¶
from pettingzoo.butterfly import pistonball_v6
env = pistonball_v6.env(render_mode="human")
env.reset(seed=42)
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
action = None
else:
# this is where you would insert your policy
action = env.action_space(agent).sample()
env.step(action)
env.close()
Parallel¶
from pettingzoo.butterfly import pistonball_v6
env = pistonball_v6.parallel_env(render_mode="human")
observations, infos = env.reset()
while env.agents:
# this is where you would insert your policy
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()
API¶
- class pettingzoo.butterfly.pistonball.pistonball.raw_env(n_pistons=20, time_penalty=-0.1, continuous=True, random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3, ball_elasticity=1.5, max_cycles=125, render_mode=None)[source]¶
- action_space(agent)[source]¶
Takes in agent and returns the action space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the action_spaces dict
- close()[source]¶
Closes any resources that should be released.
Closes the rendering window, subprocesses, network connections, or any other resources that should be released.
- observation_space(agent)[source]¶
Takes in agent and returns the observation space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the observation_spaces dict
- observe(agent)[source]¶
Returns the observation an agent currently can make.
last() calls this function.
- render()[source]¶
Renders the environment as specified by self.render_mode.
Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).