pantheonrl.common.wrappers.SimultaneousFrameStack

class SimultaneousFrameStack(env, numframes, defaultobs=None)[source]

Bases: SimultaneousEnv

Wrapper that stacks the observations of a simultaneous environment.

Parameters:
  • env (Env) – The environment to wrap

  • numframes (int) – The number of frames to stack for each observation

  • defaultobs (ndarray | None) – The default observation that fills old segments of the frame stacks.

Methods

add_partner_agent

Add agent to the list of potential partner agents.

close

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

construct_single_agent_interface

Construct a gym interface to be used by a single-agent RL algorithm.

get_dummy_env

Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.

get_ego_ind

Returns the current player number for the ego agent

get_wrapper_attr

Gets the attribute name from the environment.

multi_reset

Reset the environment and give the observation of both agents.

multi_step

Perform the ego-agent's and partner's actions.

n_reset

Reset the environment and return which agents will move first along with their initial observations.

n_step

Perform the actions specified by the agents that will move.

render

Compute the render frames as specified by render_mode during the initialization of the environment.

resample_null

Do not resample each partner policy

resample_random

Randomly resamples each partner policy

resample_round_robin

Sets the partner policy to the next option on the list for round-robin sampling.

reset

Reset environment to an initial state and return the first observation for the ego agent.

set_ego_extractor

Sets the function to extract Observation for the ego agent.

set_ego_ind

Sets the current player number for the ego agent

set_partnerid

Set the current partner agent to use

set_resample_policy

Set the resample_partner method to round "robin" or "random"

step

Run one timestep from the perspective of the ego-agent.

Attributes

action_space

The action space of the ego agent

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

observation_space

The observation space of the ego agent

render_mode

reward_range

spec

unwrapped

Returns the base non-wrapped environment.

property action_space: Space

The action space of the ego agent

add_partner_agent(agent, player_num=1)

Add agent to the list of potential partner agents. If there are multiple agents that can be a specific player number, the environment randomly samples from them at the start of every episode.

Parameters:
  • agent (Agent) – Agent to add

  • player_num (int) – the player number that this new agent can be

Return type:

None

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

construct_single_agent_interface(player_num)

Construct a gym interface to be used by a single-agent RL algorithm.

Note that when training a policy using this interface, it must be spawned in a separate Thread. Please refer to the custom_sarl.py file in examples to see how to appropriately use this function.

Parameters:

player_num (int) – the player number to build the interface around

Returns:

environment to use for the new player

get_dummy_env(player_num)

Returns a dummy environment with just an observation and action space that a partner agent can use to construct their policy network.

Parameters:

player_num (int) – the partner number to query

Returns:

Dummy environment for this player number

get_ego_ind()

Returns the current player number for the ego agent

get_wrapper_attr(name)

Gets the attribute name from the environment.

Parameters:

name (str) –

Return type:

Any

multi_reset()[source]

Reset the environment and give the observation of both agents.

This function is called by the reset function.

Returns:

The observations of both agents

Return type:

Tuple[ndarray, ndarray]

multi_step(ego_action, alt_action)[source]

Perform the ego-agent’s and partner’s actions. This function returns a tuple of (observations, both rewards, done, info).

This function is called by the step function.

Parameters:
  • ego_action (ndarray) – An action provided by the ego-agent.

  • alt_action (ndarray) – An action provided by the partner.

Returns:

observations: Tuple representing the next observations (ego, alt)

rewards: Tuple representing the rewards of both agents (ego, alt)

done: Whether the episode has ended

info: Extra information about the environment

Return type:

Tuple[Tuple[ndarray | None, ndarray | None], Tuple[float, float], bool, Dict]

n_reset()

Reset the environment and return which agents will move first along with their initial observations.

This function is called by the reset function.

Returns:

agents: Tuple representing the agents that will move first

observations: Tuple representing the observations of both agents

Return type:

Tuple[Tuple[int, …], Tuple[Observation | None, …]]

n_step(actions)

Perform the actions specified by the agents that will move. This function returns a tuple of (next agents, observations, both rewards, done, info).

This function is called by the step function.

Parameters:

actions (List[ndarray]) – List of action provided agents that are acting on this step.

Returns:

agents: Tuple representing the agents to call for the next actions

observations: Tuple representing the next observations (ego, alt)

rewards: Tuple representing the rewards of all agents

done: Whether the episode has ended

info: Extra information about the environment

Return type:

Tuple[Tuple[int, …], Tuple[Observation | None, …], Tuple[float, …], bool, Dict]

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property observation_space: Space

The observation space of the ego agent

render()

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

Return type:

RenderFrame | list[RenderFrame] | None

resample_null()

Do not resample each partner policy

Return type:

None

resample_random()

Randomly resamples each partner policy

Return type:

None

resample_round_robin()

Sets the partner policy to the next option on the list for round-robin sampling.

Note: This function is only valid for 2-player environments

Return type:

None

reset(*, seed=None, options=None)

Reset environment to an initial state and return the first observation for the ego agent.

Returns:

Ego-agent’s first observation

Parameters:
  • seed (int | None) –

  • options (dict[str, Any] | None) –

Return type:

tuple[Observation, dict[str, Any]]

set_ego_extractor(ego_extractor)

Sets the function to extract Observation for the ego agent.

Parameters:

ego_extractor (Callable[[Observation], Any]) – Function to extract Observation into the type the ego agent expects

set_ego_ind(new_ind, silence_partner_warning=False)

Sets the current player number for the ego agent

..warning:: Modifying the ego_ind after partners have been added will

change the player number of those partners as well

Parameters:
  • new_ind (int) – the new index of the ego player

  • silence_partner_warning (bool) – Whether to suppress the partner warning

set_partnerid(agent_id, player_num=1)

Set the current partner agent to use

Parameters:
  • agent_id (int) – agent_id to use as current partner

  • player_num (int) – The player number

Return type:

None

set_resample_policy(resample_policy)

Set the resample_partner method to round “robin” or “random”

Parameters:

resample_policy (str) – The new resampling policy to use. Valid values are: “default”, “robin”, “random”, or “null”

Return type:

None

step(action)

Run one timestep from the perspective of the ego-agent. This involves calling the ego_step function and the alt_step function to get to the next observation of the ego agent.

Accepts the ego-agent’s action and returns a tuple of (observation, reward, done, info) from the perspective of the ego agent.

Note that when the environment is done, the final observation is the latest observation provided by the environment, which may be the same as the previous observation given to the agent, especially in turn-based settings.

Parameters:

action (ndarray) – An action provided by the ego-agent.

Returns:

observation: Ego-agent’s next observation

reward: Amount of reward returned after previous action

terminated: Whether the episode has ended (call reset() if True)

truncated: Whether the episode was truncated (call reset() if True)

info: Extra information about the environment

Return type:

tuple[Observation | Any, float, bool, bool, dict[str, Any]]

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance