Texas Hold’em No Limit

../../../_images/classic_texas_holdem_no_limit.gif

This environment is part of the classic environments. Please read that page first for general information.

Import

from pettingzoo.classic import texas_holdem_no_limit_v6

Actions

Discrete

Parallel API

Yes

Manual Control

No

Agents

agents= ['player_0', 'player_1']

Agents

2

Action Shape

Discrete(5)

Action Values

Discrete(5)

Observation Shape

(54,)

Observation Values

[0, 100]

Texas Hold’em No Limit is a variation of Texas Hold’em where there is no limit on the amount of each raise or the number of raises.

Our implementation wraps RLCard and you can refer to its documentation for additional details. Please cite their work if you use this game in research.

Arguments

texas_holdem_no_limit_v6.env(num_players=2)

num_players: Sets the number of players in the game. Minimum is 2.

Texas Hold’em is a poker game involving 2 players and a regular 52 cards deck. At the beginning, both players get two cards. After betting, three community cards are shown and another round follows. At any time, a player could fold and the game will end. The winner will receive +1 as a reward and the loser will get -1. This is an implementation of the standard limited version of Texas Hold’m, sometimes referred to as ‘Limit Texas Hold’em’.

Our implementation wraps RLCard and you can refer to its documentation for additional details. Please cite their work if you use this game in research.

Observation Space

The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section.

The main observation space is similar to Texas Hold’em. The first 52 entries represent the union of the current player’s hand and the community cards.

Index

Description

Values

0 - 12

Spades
0: A, 1: 2, …, 12: K

[0, 1]

13 - 25

Hearts
13: A, 14: 2, …, 25: K

[0, 1]

26 - 38

Diamonds
26: A, 27: 2, …, 38: K

[0, 1]

39 - 51

Clubs
39: A, 40: 2, …, 51: K

[0, 1]

52

Number of Chips of player_0

[0, 100]

53

Number of Chips of player_1

[0, 100]

Action Space

Action ID

Action

0

Fold

1

Check & Call

2

Raise Half Pot

3

Raise Full Pot

4

All In

Rewards

Winner

Loser

+raised chips/2

-raised chips/2

Version History

  • v6: Upgrade to RLCard 1.0.5, fixes to the action space as ACPC (1.12.0)

  • v5: Upgrade to RLCard 1.0.4, fixes to rewards with greater than 2 players (1.11.1)

  • v4: Upgrade to RLCard 1.0.3 (1.11.0)

  • v3: Fixed bug in arbitrary calls to observe() (1.8.0)

  • v2: Bumped RLCard version, bug fixes, legal action mask in observation replaced illegal move list in infos (1.5.0)

  • v1: Bumped RLCard version, fixed observation space, adopted new agent iteration scheme where all agents are iterated over after they are done (1.4.0)

  • v0: Initial versions release (1.0.0)

Usage

AEC

from pettingzoo.classic import texas_holdem_no_limit_v6

env = texas_holdem_no_limit_v6.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        mask = observation["action_mask"]
        # this is where you would insert your policy
        action = env.action_space(agent).sample(mask)

    env.step(action)
env.close()

API

class pettingzoo.classic.rlcard_envs.texas_holdem_no_limit.env(**kwargs)[source]
class pettingzoo.classic.rlcard_envs.texas_holdem_no_limit.raw_env(num_players: int = 2, render_mode: str | None = None, screen_height: int | None = 1000)[source]
render()[source]

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

step(action)[source]

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.