pantheonrl.common.multiagentenv.MultiAgentEnv

class MultiAgentEnv(observation_spaces, action_spaces, ego_ind=0, n_players=2, resample_policy='default', partners=None, ego_extractor=<function extract_obs>)[source]

Bases: Env, ABC

Base class for all Multi-agent environments.

Parameters:

observation_spaces (List[Space]) – The observation space for each player
action_spaces (List[Space]) – The action space for each player
ego_ind (int) – The player number that the ego represents
n_players (int) – The number of players in the game
resample_policy (str) – The resampling policy (see set_resample_policy)
partners (List[List[Agent]] | None) – Lists of agents to choose from for the partner players
ego_extractor (Callable[[Observation], Any]) – Function to extract Observation into the type the ego agent expects

Methods

`add_partner_agent`	Add agent to the list of potential partner agents.
`close`	After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
`construct_single_agent_interface`	Construct a gym interface to be used by a single-agent RL algorithm.
`get_dummy_env`	Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.
`get_ego_ind`	Returns the current player number for the ego agent
`get_wrapper_attr`	Gets the attribute name from the environment.
`n_reset`	Reset the environment and return which agents will move first along with their initial observations.
`n_step`	Perform the actions specified by the agents that will move.
`render`	Compute the render frames as specified by `render_mode` during the initialization of the environment.
`resample_null`	Do not resample each partner policy
`resample_random`	Randomly resamples each partner policy
`resample_round_robin`	Sets the partner policy to the next option on the list for round-robin sampling.
`reset`	Reset environment to an initial state and return the first observation for the ego agent.
`set_ego_extractor`	Sets the function to extract Observation for the ego agent.
`set_ego_ind`	Sets the current player number for the ego agent
`set_partnerid`	Set the current partner agent to use
`set_resample_policy`	Set the resample_partner method to round "robin" or "random"
`step`	Run one timestep from the perspective of the ego-agent.

Attributes

`action_space`	The action space of the ego agent
`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`observation_space`	The observation space of the ego agent
`render_mode`
`reward_range`
`spec`
`unwrapped`	Returns the base non-wrapped environment.

property action_space: Space: The action space of the ego agent

add_partner_agent(agent, player_num=1)[source]

Add agent to the list of potential partner agents. If there are multiple agents that can be a specific player number, the environment randomly samples from them at the start of every episode.

Parameters:

agent (Agent) – Agent to add
player_num (int) – the player number that this new agent can be

Return type:

None

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

construct_single_agent_interface(player_num)[source]

Construct a gym interface to be used by a single-agent RL algorithm.

Note that when training a policy using this interface, it must be spawned in a separate Thread. Please refer to the custom_sarl.py file in examples to see how to appropriately use this function.

Parameters:: player_num (int) – the player number to build the interface around
Returns:: environment to use for the new player

get_dummy_env(player_num)[source]

Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.

Parameters:: player_num (int) – the partner number to query
Returns:: Dummy environment for this player number

get_ego_ind()[source]: Returns the current player number for the ego agent

get_wrapper_attr(name)

Gets the attribute name from the environment.

Parameters:: name (str) –
Return type:: Any

abstract n_reset()[source]

Reset the environment and return which agents will move first along with their initial observations.

This function is called by the reset function.

Returns:

agents: Tuple representing the agents that will move first

observations: Tuple representing the observations of both agents

Return type:

Tuple[Tuple[int, …], Tuple[Observation | None, …]]

abstract n_step(actions)[source]

Perform the actions specified by the agents that will move. This function returns a tuple of (next agents, observations, both rewards, done, info).

This function is called by the step function.

Parameters:

actions (List[ndarray]) – List of action provided agents that are acting on this step.

Returns:

agents: Tuple representing the agents to call for the next actions

observations: Tuple representing the next observations (ego, alt)

rewards: Tuple representing the rewards of all agents

done: Whether the episode has ended

info: Extra information about the environment

Return type:

Tuple[Tuple[int, …], Tuple[Observation | None, …], Tuple[float, …], bool, Dict]

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property observation_space: Space: The observation space of the ego agent

render()

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:: As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:: Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

Return type:: RenderFrame | list[RenderFrame] | None

resample_null()[source]

Do not resample each partner policy

Return type:: None

resample_random()[source]

Randomly resamples each partner policy

Return type:: None

resample_round_robin()[source]

Sets the partner policy to the next option on the list for round-robin sampling.

Note: This function is only valid for 2-player environments

Return type:: None

reset(*, seed=None, options=None)[source]

Reset environment to an initial state and return the first observation for the ego agent.

Returns:

Ego-agent’s first observation

Parameters:

seed (int | None) –
options (dict[str, Any] | None) –

Return type:

tuple[Observation, dict[str, Any]]

set_ego_extractor(ego_extractor)[source]

Sets the function to extract Observation for the ego agent.

Parameters:: ego_extractor (Callable[[Observation], Any]) – Function to extract Observation into the type the ego agent expects

set_ego_ind(new_ind, silence_partner_warning=False)[source]

Sets the current player number for the ego agent

..warning:: Modifying the ego_ind after partners have been added will: change the player number of those partners as well

Parameters:

new_ind (int) – the new index of the ego player
silence_partner_warning (bool) – Whether to suppress the partner warning

set_partnerid(agent_id, player_num=1)[source]

Set the current partner agent to use

Parameters:

agent_id (int) – agent_id to use as current partner
player_num (int) – The player number

Return type:

None

set_resample_policy(resample_policy)[source]

Set the resample_partner method to round “robin” or “random”

Parameters:: resample_policy (str) – The new resampling policy to use. Valid values are: “default”, “robin”, “random”, or “null”
Return type:: None

step(action)[source]

Run one timestep from the perspective of the ego-agent. This involves calling the ego_step function and the alt_step function to get to the next observation of the ego agent.

Accepts the ego-agent’s action and returns a tuple of (observation, reward, done, info) from the perspective of the ego agent.

Note that when the environment is done, the final observation is the latest observation provided by the environment, which may be the same as the previous observation given to the agent, especially in turn-based settings.

Parameters:

action (ndarray) – An action provided by the ego-agent.

Returns:

observation: Ego-agent’s next observation

reward: Amount of reward returned after previous action

terminated: Whether the episode has ended (call reset() if True)

truncated: Whether the episode was truncated (call reset() if True)

info: Extra information about the environment

Return type:

tuple[Observation | Any, float, bool, bool, dict[str, Any]]

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gymnasium.Env instance