pantheonrl.common.multiagentenv.MultiAgentEnv
- class MultiAgentEnv(observation_spaces, action_spaces, ego_ind=0, n_players=2, resample_policy='default', partners=None, ego_extractor=<function extract_obs>)[source]
Bases:
Env,ABCBase class for all Multi-agent environments.
- Parameters:
observation_spaces (List[Space]) – The observation space for each player
action_spaces (List[Space]) – The action space for each player
ego_ind (int) – The player number that the ego represents
n_players (int) – The number of players in the game
resample_policy (str) – The resampling policy (see set_resample_policy)
partners (List[List[Agent]] | None) – Lists of agents to choose from for the partner players
ego_extractor (Callable[[Observation], Any]) – Function to extract Observation into the type the ego agent expects
Methods
Add agent to the list of potential partner agents.
After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
Construct a gym interface to be used by a single-agent RL algorithm.
Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.
Returns the current player number for the ego agent
Gets the attribute name from the environment.
Reset the environment and return which agents will move first along with their initial observations.
Perform the actions specified by the agents that will move.
Compute the render frames as specified by
render_modeduring the initialization of the environment.Do not resample each partner policy
Randomly resamples each partner policy
Sets the partner policy to the next option on the list for round-robin sampling.
Reset environment to an initial state and return the first observation for the ego agent.
Sets the function to extract Observation for the ego agent.
Sets the current player number for the ego agent
Set the current partner agent to use
Set the resample_partner method to round "robin" or "random"
Run one timestep from the perspective of the ego-agent.
Attributes
The action space of the ego agent
metadataReturns the environment's internal
_np_randomthat if not set will initialise with a random seed.The observation space of the ego agent
render_modereward_rangespecReturns the base non-wrapped environment.
- property action_space: Space
The action space of the ego agent
- add_partner_agent(agent, player_num=1)[source]
Add agent to the list of potential partner agents. If there are multiple agents that can be a specific player number, the environment randomly samples from them at the start of every episode.
- Parameters:
agent (Agent) – Agent to add
player_num (int) – the player number that this new agent can be
- Return type:
None
- close()
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
closeon an already closed environment has no effect and won’t raise an error.
- construct_single_agent_interface(player_num)[source]
Construct a gym interface to be used by a single-agent RL algorithm.
Note that when training a policy using this interface, it must be spawned in a separate Thread. Please refer to the custom_sarl.py file in examples to see how to appropriately use this function.
- Parameters:
player_num (int) – the player number to build the interface around
- Returns:
environment to use for the new player
- get_dummy_env(player_num)[source]
Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.
- Parameters:
player_num (int) – the partner number to query
- Returns:
Dummy environment for this player number
- get_wrapper_attr(name)
Gets the attribute name from the environment.
- Parameters:
name (str) –
- Return type:
Any
- abstract n_reset()[source]
Reset the environment and return which agents will move first along with their initial observations.
This function is called by the reset function.
- Returns:
agents: Tuple representing the agents that will move first
observations: Tuple representing the observations of both agents
- Return type:
Tuple[Tuple[int, …], Tuple[Observation | None, …]]
- abstract n_step(actions)[source]
Perform the actions specified by the agents that will move. This function returns a tuple of (next agents, observations, both rewards, done, info).
This function is called by the step function.
- Parameters:
actions (List[ndarray]) – List of action provided agents that are acting on this step.
- Returns:
agents: Tuple representing the agents to call for the next actions
observations: Tuple representing the next observations (ego, alt)
rewards: Tuple representing the rewards of all agents
done: Whether the episode has ended
info: Extra information about the environment
- Return type:
Tuple[Tuple[int, …], Tuple[Observation | None, …], Tuple[float, …], bool, Dict]
- property np_random: Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property observation_space: Space
The observation space of the ego agent
- render()
Compute the render frames as specified by
render_modeduring the initialization of the environment.The environment’s
metadatarender modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.- Note:
As the
render_modeis known during__init__, the objects used to render the environment state should be initialised in__init__.
By convention, if the
render_modeis:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()andrender()doesn’t need to be called. ReturnsNone.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarraywith shape(x, y, 3)representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str) orStringIO.StringIOcontaining a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollectionthat is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped afterrender()is called orreset().
- Note:
Make sure that your class’s
metadata"render_modes"key includes the list of supported modes.
Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")- Return type:
RenderFrame | list[RenderFrame] | None
- resample_round_robin()[source]
Sets the partner policy to the next option on the list for round-robin sampling.
Note: This function is only valid for 2-player environments
- Return type:
None
- reset(*, seed=None, options=None)[source]
Reset environment to an initial state and return the first observation for the ego agent.
- Returns:
Ego-agent’s first observation
- Parameters:
seed (int | None) –
options (dict[str, Any] | None) –
- Return type:
tuple[Observation, dict[str, Any]]
- set_ego_extractor(ego_extractor)[source]
Sets the function to extract Observation for the ego agent.
- Parameters:
ego_extractor (Callable[[Observation], Any]) – Function to extract Observation into the type the ego agent expects
- set_ego_ind(new_ind, silence_partner_warning=False)[source]
Sets the current player number for the ego agent
- ..warning:: Modifying the ego_ind after partners have been added will
change the player number of those partners as well
- Parameters:
new_ind (int) – the new index of the ego player
silence_partner_warning (bool) – Whether to suppress the partner warning
- set_partnerid(agent_id, player_num=1)[source]
Set the current partner agent to use
- Parameters:
agent_id (int) – agent_id to use as current partner
player_num (int) – The player number
- Return type:
None
- set_resample_policy(resample_policy)[source]
Set the resample_partner method to round “robin” or “random”
- Parameters:
resample_policy (str) – The new resampling policy to use. Valid values are: “default”, “robin”, “random”, or “null”
- Return type:
None
- step(action)[source]
Run one timestep from the perspective of the ego-agent. This involves calling the ego_step function and the alt_step function to get to the next observation of the ego agent.
Accepts the ego-agent’s action and returns a tuple of (observation, reward, done, info) from the perspective of the ego agent.
Note that when the environment is done, the final observation is the latest observation provided by the environment, which may be the same as the previous observation given to the agent, especially in turn-based settings.
- Parameters:
action (ndarray) – An action provided by the ego-agent.
- Returns:
observation: Ego-agent’s next observation
reward: Amount of reward returned after previous action
terminated: Whether the episode has ended (call reset() if True)
truncated: Whether the episode was truncated (call reset() if True)
info: Extra information about the environment
- Return type:
tuple[Observation | Any, float, bool, bool, dict[str, Any]]
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance