Parallel API¶

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via <game>.parallel_env(). This API is based around the paradigm of Partially Observable Stochastic Games (POSGs) and the details are similar to RLlib’s MultiAgent environment specification, except we allow for different observation and action spaces between the agents.

For a comparison with the AEC API, see About AEC. For more information, see PettingZoo: A Standard API for Multi-Agent Reinforcement Learning.

PettingZoo Wrappers can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

Examples¶

PettingZoo Butterfly provides standard examples of Parallel environments, such as Pistonball.

We provide tutorials for creating two custom Parallel environments: Rock-Paper-Scissors (Parallel), and a simple gridworld environment

Usage¶

Parallel environments can be interacted with as follows:

from pettingzoo.butterfly import pistonball_v6
parallel_env = pistonball_v6.parallel_env(render_mode="human")
observations, infos = parallel_env.reset(seed=42)

while parallel_env.agents:
    # this is where you would insert your policy
    actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}

    observations, rewards, terminations, truncations, infos = parallel_env.step(actions)
parallel_env.close()

ParallelEnv¶

class pettingzoo.utils.env.ParallelEnv[source]¶

Parallel environment class.

It steps every live agent at once. If you are unsure if you have implemented a ParallelEnv correctly, try running the parallel_api_test in the Developer documentation on the website.

agents¶

A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).

Type:: list[AgentID]

num_agents¶

The length of the agents list.

Type:: int

possible_agents¶

A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.

Type:: list[AgentID]

max_num_agents¶

The length of the possible_agents list.

Type:: int

observation_spaces¶

A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gym.spaces.Space]

action_spaces¶

A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.

Type:: Dict[AgentID, gym.spaces.Space]

step(actions: dict[AgentID, ActionType]) → tuple[dict[AgentID, ObsType], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]][source]¶

Receives a dictionary of actions keyed by the agent name.

Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.

reset(seed: int | None = None, options: dict | None = None) → tuple[dict[AgentID, ObsType], dict[AgentID, dict]][source]¶

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

render() → None | ndarray | str | list[source]¶

Displays a rendered frame from the environment, if supported.

Alternate render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

close()[source]¶: Closes the rendering window.

state() → ndarray[source]¶

Returns the state.

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

observation_space(agent: AgentID) → Space[source]¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

action_space(agent: AgentID) → Space[source]¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict