Go

../../../_images/classic_go.gif

This environment is part of the classic environments. Please read that page first for general information.

Import

from pettingzoo.classic import go_v5

Actions

Discrete

Parallel API

Yes

Manual Control

No

Agents

agents= ['black_0', 'white_0']

Agents

2

Action Shape

Discrete(362)

Action Values

Discrete(362)

Observation Shape

(19, 19, 3)

Observation Values

[0, 1]

Go is a board game with 2 players, black and white. The black player starts by placing a black stone at an empty board intersection. The white player follows by placing a stone of their own, aiming to either surround more territory than their opponent or capture the opponent’s stones. The game ends if both players sequentially decide to pass.

Our implementation is a wrapper for MiniGo.

Arguments

Go takes two optional arguments that define the board size (int) and komi compensation points (float). The default values for the board size and komi are 19 and 7.5, respectively.

go_v5.env(board_size = 19, komi = 7.5)

board_size: The length of each size of the board.

komi: The number of points given to white to compensate it for the disadvantage inherent to moving second. 7.5 is the standard value for Chinese tournament Go, but may not be perfectly balanced.

Observation Space

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section.

The main observation shape is a function of the board size N and has a shape of (N, N, 3). The first plane, (:, :, 0), represent the stones on the board for the current player while the second plane, (:, :, 1), encodes the stones of the opponent. The third plane, (:, :, 2), is all 1 if the current player is black_0 or all 0 if the player is white_0. The state of the board is represented with the top left corner as (0, 0). For example, a (9, 9) board is

   0 1 2 3 4 5 6 7 8
 0 . . . . . . . . .  0
 1 . . . . . . . . .  1
 2 . . . . . . . . .  2
 3 . . . . . . . . .  3
 4 . . . . . . . . .  4
 5 . . . . . . . . .  5
 6 . . . . . . . . .  6
 7 . . . . . . . . .  7
 8 . . . . . . . . .  8
   0 1 2 3 4 5 6 7 8

Plane

Description

0

Current Player’s stones
0: no stone, 1: stone

1

Opponent Player’s stones
0: no stone, 1: stone

2

Player
0: white, 1: black

While rendering, the board coordinate system is GTP.

Action Space

Similar to the observation space, the action space is dependent on the board size N.

Action ID

Description

\(0 \ldots (N-1)\)

Place a stone on the 1st row of the board.
0: (0,0), 1: (0,1), …, N-1: (0,N-1)

\(N \ldots (2N- 1)\)

Place a stone on the 2nd row of the board.
N: (1,0), N+1: (1,1), …, 2N-1: (1,N-1)

\((N^2-N) \ldots (N^2-1)\)

Place a stone on the Nth row of the board.
N^2-N: (N-1,0), N^2-N+1: (N-1,1), …, N^2-1: (N-1,N-1)

\(N^2\)

Pass

For example, you would use action 4 to place a stone on the board at the (0,3) location or action N^2 to pass. You can transform a non-pass action a back into its 2D (x,y) coordinate by computing (a//N, a%N). The total action space is \(N^2+1\).

Rewards

Winner

Loser

+1

-1

Version History

  • v5: Changed observation space to proper AlphaZero style frame stacking (1.11.0)

  • v4: Fixed bug in how black and white pieces were saved in observation space (1.10.0)

  • v3: Fixed bug in arbitrary calls to observe() (1.8.0)

  • v2: Legal action mask in observation replaced illegal move list in infos (1.5.0)

  • v1: Bumped version of all environments due to adoption of new agent iteration scheme where all agents are iterated over after they are done (1.4.0)

  • v0: Initial versions release (1.0.0)

Usage

AEC

from pettingzoo.classic import go_v5

env = go_v5.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        mask = observation["action_mask"]
        # this is where you would insert your policy
        action = env.action_space(agent).sample(mask)

    env.step(action)
env.close()

API

class pettingzoo.classic.go.go.env(**kwargs)[source]
class pettingzoo.classic.go.go.raw_env(board_size: int = 19, komi: float = 7.5, render_mode: str | None = None, screen_height: int | None = 800)[source]
action_space(agent)[source]

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

close()[source]

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

observation_space(agent)[source]

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

observe(agent)[source]

Returns the observation an agent currently can make.

last() calls this function.

render()[source]

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]

Resets the environment to a starting state.

step(action)[source]

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.