Simple Speaker Listener¶
This environment is part of the MPE environments. Please read that page first for general information.
Import |
|
---|---|
Actions |
Discrete/Continuous |
Parallel API |
Yes |
Manual Control |
No |
Agents |
|
Agents |
2 |
Action Shape |
(3),(5) |
Action Values |
Discrete(3),(5)/Box(0.0, 1.0, (3)), Box(0.0, 1.0, (5)) |
Observation Shape |
(3),(11) |
Observation Values |
(-inf,inf) |
State Shape |
(14,) |
State Values |
(-inf,inf) |
This environment is similar to simple_reference, except that one agent is the ‘speaker’ (gray) and can speak but cannot move, while the other agent is the listener (cannot speak, but must navigate to correct landmark).
Speaker observation space: [goal_id]
Listener observation space: [self_vel, all_landmark_rel_positions, communication]
Speaker action space: [say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9]
Listener action space: [no_action, move_left, move_right, move_down, move_up]
Arguments¶
simple_speaker_listener_v4.env(max_cycles=25, continuous_actions=False, dynamic_rescaling=False)
max_cycles
: number of frames (a step for each agent) until game terminates
continuous_actions
: Whether agent action spaces are discrete(default) or continuous
dynamic_rescaling
: Whether to rescale the size of agents and landmarks based on the screen size
Usage¶
AEC¶
from pettingzoo.mpe import simple_speaker_listener_v4
env = simple_speaker_listener_v4.env(render_mode="human")
env.reset(seed=42)
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
action = None
else:
# this is where you would insert your policy
action = env.action_space(agent).sample()
env.step(action)
env.close()
Parallel¶
from pettingzoo.mpe import simple_speaker_listener_v4
env = simple_speaker_listener_v4.parallel_env(render_mode="human")
observations, infos = env.reset()
while env.agents:
# this is where you would insert your policy
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()