Tic Tac Toe#


This environment is part of the classic environments. Please read that page first for general information.


from pettingzoo.classic import tictactoe_v3



Parallel API


Manual Control



agents= ['player_1', 'player_2']



Action Shape


Action Values

[0, 8]

Observation Shape

(3, 3, 2)

Observation Values


Tic-tac-toe is a simple turn based strategy game where 2 players, X and O, take turns marking spaces on a 3 x 3 grid. The first player to place 3 of their marks in a horizontal, vertical, or diagonal line is the winner.

Observation Space#

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section.

The main observation is 2 planes of the 3x3 board. For player_1, the first plane represents the placement of Xs, and the second plane shows the placement of Os. The possible values for each cell are 0 or 1; in the first plane, 1 indicates that an X has been placed in that cell, and 0 indicates that X is not in that cell. Similarly, in the second plane, 1 indicates that an O has been placed in that cell, while 0 indicates that an O has not been placed. For player_2, the observation is the same, but Xs and Os swap positions, so Os are encoded in plane 1 and Xs in plane 2. This allows for self-play.

Action Space#

Each action from 0 to 8 represents placing either an X or O in the corresponding cell. The cells are indexed as follows:

0 | 3 | 6

1 | 4 | 7

2 | 5 | 8






If the game ends in a draw, both players will receive a reward of 0.

Version History#

  • v3: Fixed bug in arbitrary calls to observe() (1.8.0)

  • v2: Legal action mask in observation replaced illegal move list in infos (1.5.0)

  • v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0)

  • v0: Initial versions release (1.0.0)



from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env(render_mode="human")

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
        mask = observation["action_mask"]
        # this is where you would insert your policy
        action = env.action_space(agent).sample(mask)



class pettingzoo.classic.tictactoe.tictactoe.env(render_mode=None)[source]#
class pettingzoo.classic.tictactoe.tictactoe.raw_env(render_mode: str | None = None, screen_height: int | None = 1000)[source]#

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict


Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.


Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict


Returns the observation an agent currently can make.

last() calls this function.


Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]#

Resets the environment to a starting state.


Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.