Rock Paper Scissors¶

This environment is part of the classic environments. Please read that page first for general information.

Import	`from pettingzoo.classic import rps_v2`
Actions	Discrete
Parallel API	Yes
Manual Control	No
Agents	`agents= ['player_0', 'player_1']`
Agents	2
Action Shape	Discrete(3)
Action Values	Discrete(3)
Observation Shape	Discrete(4)
Observation Values	Discrete(4)

Rock, Paper, Scissors is a 2-player hand game where each player chooses either rock, paper or scissors and reveals their choices simultaneously. If both players make the same choice, then it is a draw. However, if their choices are different, the winner is determined as follows: rock beats scissors, scissors beat paper, and paper beats rock.

The game can be expanded to have extra actions by adding new action pairs. Adding the new actions in pairs allows for a more balanced game. This means that the final game will have an odd number of actions and each action wins over exactly half of the other actions while being defeated by the other half. The most common expansion of this game is Rock, Paper, Scissors, Lizard, Spock, in which only one extra action pair is added.

Arguments¶

rps_v2.env(num_actions=3, max_cycles=15)

num_actions: number of actions applicable in the game. The default value is 3 for the game of Rock, Paper, Scissors. This argument must be an integer greater than 3 and with odd parity. If the value given is 5, the game is expanded to Rock, Paper, Scissors, Lizard, Spock.

max_cycles: after max_cycles steps all agents will return done.

Observation Space¶

Rock, Paper, Scissors¶

If 3 actions are required, the game played is the standard Rock, Paper, Scissors. The observation is the last opponent action and its space is a scalar value with 4 possible values. Since both players reveal their choices at the same time, the observation is None until both players have acted. Therefore, 3 represents no action taken yet. Rock is represented with 0, paper with 1 and scissors with 2.

Value	Observation
0	Rock
1	Paper
2	Scissors
3	None

Expanded Game¶

If the number of actions required in the game is greater than 3, the observation is still the last opponent action and its space is a scalar with 1 + n possible values, where n is the number of actions. The observation will as well be None until both players have acted and the largest possible scalar value for the space, 1 + n, represents no action taken yet. The additional actions are encoded in increasing order starting from the 0 Rock action. If 5 actions are required the game is expanded to Rock, Paper, Scissors, Lizard, Spock. The following table shows an example of an observation space with 7 possible actions.

Value	Observation
0	Rock
1	Paper
2	Scissors
3	Lizard
4	Spock
5	Action_6
6	Action_7
7	None

Action Space¶

Rock, Paper, Scissors¶

The action space is a scalar value with 3 possible values. The values are encoded as follows: Rock is 0, paper is 1 and scissors is 2.

Value	Action
0	Rock
1	Paper
2	Scissors

Expanded Game¶

The action space is a scalar value with n possible values, where n is the number of additional action pairs. The values for 7 possible actions are encoded as in the following table.

Value	Action
0	Rock
1	Paper
2	Scissors
3	Lizard
4	Spock
5	Action_6
6	Action_7

Rewards¶

Winner	Loser
+1	-1

If the game ends in a draw, both players will receive a reward of 0.

Version History¶

v2: Merge RPS and rock paper lizard scissors spock environments, add num_actions and max_cycles arguments (1.9.0)
v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0)
v0: Initial versions release (1.0.0)

Usage¶

AEC¶

from pettingzoo.classic import rps_v2

env = rps_v2.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        # this is where you would insert your policy
        action = env.action_space(agent).sample()

    env.step(action)
env.close()

Parallel¶

from pettingzoo.classic import rps_v2

env = rps_v2.parallel_env(render_mode="human")
observations, infos = env.reset()

while env.agents:
    # this is where you would insert your policy
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}

    observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()

API¶

class pettingzoo.classic.rps.rps.env(**kwargs)[source]¶

class pettingzoo.classic.rps.rps.raw_env(num_actions: int | None = 3, max_cycles: int | None = 15, render_mode: str | None = None, screen_height: int | None = 800)[source]¶

Two-player environment for rock paper scissors.

Expandable environment to rock paper scissors lizard spock action_6 action_7 … The observation is simply the last opponent action.

action_space(agent)[source]¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

close()[source]¶

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

observation_space(agent)[source]¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

observe(agent)[source]¶

Returns the observation an agent currently can make.

last() calls this function.

render()[source]¶

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]¶: Resets the environment to a starting state.

step(action)[source]¶

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.