AEC API#

By default, PettingZoo models games as Agent Environment Cycle (AEC) environments. This allows it to support any type of game multi-agent RL can consider.

PettingZoo Classic provides standard examples of AEC environments for turn-based games, many of which implement Illegal Action Masking.

We provide a tutorial for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments.

Usage#

AEC environments can be interacted with as follows:

from pettingzoo.classic import rps_v2

env = rps_v2.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    
    if termination or truncation:
        action = None
    else:    
        action = env.action_space(agent).sample() # this is where you would insert your policy
    
    env.step(action) 
env.close()

Action Masking#

AEC environments often include action masks, in order to mark valid/invalid actions for the agent.

To sample actions using action masking:

from pettingzoo.classic import chess_v5

env = chess_v5.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:  
        # invalid action masking is optional and environment-dependent
        if "action_mask" in info:
            mask = info["action_mask"]
        elif isinstance(observation, dict) and "action_mask" in observation:
            mask = observation["action_mask"]
        else:
            mask = None 
        action = env.action_space(agent).sample(mask) # this is where you would insert your policy
        
    env.step(action) 
env.close()

Note: action masking is optional, and can be implemented using either observation or info.

PettingZoo Classic environments store action masks in the observation dict:
- mask = observation["action_mask"]
Shimmy’s OpenSpiel environments stores action masks in the info dict:
- mask = info["action_mask"]

To implement action masking in a custom environment, see Environment Creation: Action Masking

For more information on action masking, see A Closer Look at Invalid Action Masking in Policy Gradient Algorithms (Huang, 2022)

AECEnv#

class pettingzoo.utils.env.AECEnv[source]#

The AECEnv steps agents one at a time.

If you are unsure if you have implemented a AECEnv correctly, try running the api_test documented in the Developer documentation on the website.

Attributes#

AECEnv.agents: list[str]#

A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).

Type:: List[AgentID]

AECEnv.num_agents#: The length of the agents list.

AECEnv.possible_agents: list[str]#

A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.

Type:: List[AgentID]

AECEnv.max_num_agents#: The length of the possible_agents list.

AECEnv.agent_selection: str#

An attribute of the environment corresponding to the currently selected agent that an action can be taken for.

Type:: AgentID

AECEnv.terminations: dict[str, bool]#

AECEnv.truncations: dict[str, bool]#

AECEnv.rewards: dict[str, float]#

A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. last() does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like:

{0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]}

Type:: Dict[AgentID, float]

AECEnv.infos: dict[str, dict[str, Any]]#

A dict of info for each current agent, keyed by name. Each agent’s info is also a dict. Note that agents can be added or removed from this attribute. last() accesses this attribute. The returned dict looks like:

infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]}

Type:: Dict[AgentID, Dict[str, Any]]

AECEnv.observation_spaces: dict[str, gymnasium.spaces.space.Space]#

A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gymnasium.spaces.Space]

AECEnv.action_spaces: dict[str, gymnasium.spaces.space.Space]#

A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gymnasium.spaces.Space]

Methods#

AECEnv.step(action: ActionType) → None[source]#

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.

AECEnv.reset(seed: int | None = None, options: dict | None = None) → None[source]#: Resets the environment to a starting state.

AECEnv.observe(agent: str) → ObsType | None[source]#

Returns the observation an agent currently can make.

last() calls this function.

AECEnv.render() → None | np.ndarray | str | list[source]#

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

AECEnv.close()[source]#

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.