Connect Four#
 
This environment is part of the classic environments. Please read that page first for general information.
| Import | 
 | 
|---|---|
| Actions | Discrete | 
| Parallel API | Yes | 
| Manual Control | No | 
| Agents | 
 | 
| Agents | 2 | 
| Action Shape | (1,) | 
| Action Values | Discrete(7) | 
| Observation Shape | (6, 7, 2) | 
| Observation Values | [0,1] | 
Connect Four is a 2-player turn based game, where players must connect four of their tokens vertically, horizontally or diagonally. The players drop their respective token in a column of a standing grid, where each token will fall until it reaches the bottom of the column or reaches an existing token. Players cannot place a token in a full column, and the game ends when either a player has made a sequence of 4 tokens, or when all 7 columns have been filled.
Observation Space#
The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an  'action_mask' which holds the legal moves, described in the Legal Actions Mask section.
The main observation space is 2 planes of a 6x7 grid. Each plane represents a specific agent’s tokens, and each location in the grid represents the placement of the corresponding agent’s token. 1 indicates that the agent has a token placed in that cell, and 0 indicates they do not have a token in that cell. A 0 means that either the cell is empty, or the other agent has a token in that cell.
Legal Actions Mask#
The legal moves available to the current agent are found in the action_mask element of the dictionary observation. The action_mask is a binary vector where each index of the vector represents whether the action is legal or not. The action_mask will be all zeros for any agent except the one
whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.
Action Space#
The action space is the set of integers from 0 to 6 (inclusive), where the action represents which column a token should be dropped in.
Rewards#
If an agent successfully connects four of their tokens, they will be rewarded 1 point. At the same time, the opponent agent will be awarded -1 points. If the game ends in a draw, both players are rewarded 0.
Version History#
- v3: Fixed bug in arbitrary calls to observe() (1.8.0) 
- v2: Legal action mask in observation replaced illegal move list in infos (1.5.0) 
- v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0) 
- v0: Initial versions release (1.0.0) 
Usage#
AEC#
from pettingzoo.classic import connect_four_v3
env = connect_four_v3.env(render_mode="human")
env.reset(seed=42)
for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    if termination or truncation:
        action = None
    else:
        mask = observation["action_mask"]
        # this is where you would insert your policy
        action = env.action_space(agent).sample(mask)
    env.step(action)
env.close()
API#
- class pettingzoo.classic.connect_four.connect_four.raw_env(render_mode: str | None = None, screen_scaling: int = 9)[source]#
- action_space(agent)[source]#
- Takes in agent and returns the action space for that agent. - MUST return the same value for the same agent name - Default implementation is to return the action_spaces dict 
 - close()[source]#
- Closes any resources that should be released. - Closes the rendering window, subprocesses, network connections, or any other resources that should be released. 
 - observation_space(agent)[source]#
- Takes in agent and returns the observation space for that agent. - MUST return the same value for the same agent name - Default implementation is to return the observation_spaces dict 
 - observe(agent)[source]#
- Returns the observation an agent currently can make. - last() calls this function. 
 - render()[source]#
- Renders the environment as specified by self.render_mode. - Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments). 
 
