Texas Hold’em#
 
This environment is part of the classic environments. Please read that page first for general information.
| Import | 
 | 
|---|---|
| Actions | Discrete | 
| Parallel API | Yes | 
| Manual Control | No | 
| Agents | 
 | 
| Agents | 2 | 
| Action Shape | Discrete(4) | 
| Action Values | Discrete(4) | 
| Observation Shape | (72,) | 
| Observation Values | [0, 1] | 
Arguments#
texas_holdem_v4.env(num_players=2)
num_players: Sets the number of players in the game. Minimum is 2.
Observation Space#
The observation is a dictionary which contains an 'observation' element which is the usual RL observation described below, and an  'action_mask' which holds the legal moves, described in the Legal Actions Mask section.
The main observation space is a vector of 72 boolean integers. The first 52 entries depict the current player’s hand plus any community cards as follows
| Index | Description | 
|---|---|
| 0 - 12 | Spades | 
| 13 - 25 | Hearts | 
| 26 - 38 | Diamonds | 
| 39 - 51 | Clubs | 
| 52 - 56 | Chips raised in Round 1 | 
| 57 - 61 | Chips raised in Round 2 | 
| 62 - 66 | Chips raised in Round 3 | 
| 67 - 71 | Chips raised in Round 4 | 
Legal Actions Mask#
The legal moves available to the current agent are found in the action_mask element of the dictionary observation. The action_mask is a binary vector where each index of the vector represents whether the action is legal or not. The action_mask will be all zeros for any agent except the one
whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.
Action Space#
| Action ID | Action | 
|---|---|
| 0 | Call | 
| 1 | Raise | 
| 2 | Fold | 
| 3 | Check | 
Rewards#
| Winner | Loser | 
|---|---|
| +raised chips/2 | -raised chips/2 | 
Version History#
- v4: Upgrade to RLCard 1.0.3 (1.11.0) 
- v3: Fixed bug in arbitrary calls to observe() (1.8.0) 
- v2: Bumped RLCard version, bug fixes, legal action mask in observation replaced illegal move list in infos (1.5.0) 
- v1: Bumped RLCard version, fixed observation space, adopted new agent iteration scheme where all agents are iterated over after they are done (1.4.0) 
- v0: Initial versions release (1.0.0) 
Usage#
AEC#
from pettingzoo.classic import texas_holdem_v4
env = texas_holdem_v4.env(render_mode="human")
env.reset(seed=42)
for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    if termination or truncation:
        action = None
    else:
        mask = observation["action_mask"]
        # this is where you would insert your policy
        action = env.action_space(agent).sample(mask)
    env.step(action)
env.close()
API#
- class pettingzoo.classic.rlcard_envs.texas_holdem.raw_env(num_players: int = 2, render_mode: str | None = None, screen_height: int | None = 1000)[source]#
- render()[source]#
- Renders the environment as specified by self.render_mode. - Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments). 
 
