Cooperative Pong#
This environment is part of the butterfly environments. Please read that page first for general information.
Import |
|
---|---|
Actions |
Discrete |
Parallel API |
Yes |
Manual Control |
Yes |
Agents |
|
Agents |
2 |
Action Shape |
Discrete(3) |
Action Values |
[0, 1] |
Observation Shape |
(280, 480, 3) |
Observation Values |
[0, 255] |
State Shape |
(560, 960, 3) |
State Values |
(0, 255) |
Cooperative pong is a game of simple pong, where the objective is to keep the ball in play for the longest time. The game is over when the ball goes out of bounds from either the left or right edge of the screen. There are two agents (paddles), one that moves along the left edge and the other that
moves along the right edge of the screen. All collisions of the ball are elastic. The ball always starts moving in a random direction from the center of the screen with each reset. To make learning a little more challenging, the right paddle is tiered cake-shaped by default.
The observation space of each agent is its own half of the screen. There are two possible actions for the agents (move up/down). If the ball stays within bounds, each agent receives a reward of max_reward / max_cycles
(default 0.11) at each timestep. Otherwise, each agent receives a reward of
off_screen_penalty
(default -10) and the game ends.
Manual Control#
Move the left paddle using the ‘W’ and ‘S’ keys. Move the right paddle using ‘UP’ and ‘DOWN’ arrow keys.
Arguments#
cooperative_pong_v5.env(ball_speed=9, left_paddle_speed=12,
right_paddle_speed=12, cake_paddle=True, max_cycles=900, bounce_randomness=False, max_reward=100, off_screen_penalty=-10)
ball_speed
: Speed of ball (in pixels)
left_paddle_speed
: Speed of left paddle (in pixels)
right_paddle_speed
: Speed of right paddle (in pixels)
cake_paddle
: If True, the right paddle cakes the shape of a 4 tiered wedding cake
max_cycles
: after max_cycles steps all agents will return done
bounce_randomness
: If True, each collision of the ball with the paddles adds a small random angle to the direction of the ball, with the speed of the ball remaining unchanged.
max_reward
: Total reward given to each agent over max_cycles timesteps
off_screen_penalty
: Negative reward penalty for each agent if the ball goes off the screen
Version History#
v5: Fixed ball teleporting bugs
v4: Added max_reward and off_screen_penalty arguments and changed default, fixed glitch where ball would occasionally teleport, reward redesign (1.14.0)
v3: Change observation space to include entire screen (1.10.0)
v2: Misc fixes (1.4.0)
v1: Fixed bug in how
dones
were computed (1.3.1)v0: Initial versions release (1.0.0)
Usage#
AEC#
from pettingzoo.butterfly import cooperative_pong_v5
env = cooperative_pong_v5.env(render_mode="human")
env.reset(seed=42)
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
action = None
else:
# this is where you would insert your policy
action = env.action_space(agent).sample()
env.step(action)
env.close()
Parallel#
from pettingzoo.butterfly import cooperative_pong_v5
env = cooperative_pong_v5.parallel_env(render_mode="human")
observations, infos = env.reset()
while env.agents:
# this is where you would insert your policy
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()
API#
- class pettingzoo.butterfly.cooperative_pong.cooperative_pong.raw_env(**kwargs)[source]#
- action_space(agent)[source]#
Takes in agent and returns the action space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the action_spaces dict
- close()[source]#
Closes any resources that should be released.
Closes the rendering window, subprocesses, network connections, or any other resources that should be released.
- observation_space(agent)[source]#
Takes in agent and returns the observation space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the observation_spaces dict
- observe(agent)[source]#
Returns the observation an agent currently can make.
last() calls this function.
- render()[source]#
Renders the environment as specified by self.render_mode.
Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).