pantheonrl.common.agents.OnPolicyAgent

class OnPolicyAgent(model, log_interval=None, working_timesteps=1000, callback=None, tb_log_name='OnPolicyAgent')[source]

Bases: Agent

Agent representing an on-policy learning algorithm (ex: A2C/PPO).

The get_action and update functions are based on the learn function from OnPolicyAlgorithm.

Parameters:
  • model (OnPolicyAlgorithm) – Model representing the agent’s learning algorithm

  • log_interval – Optional log interval for policy logging

  • working_timesteps – Estimate for number of timesteps to train for.

  • callback – Optional callback fed into the OnPolicyAlgorithm

  • tb_log_name – Name for tensorboard log

Warning

Note that the model will still continue training beyond the working_timesteps point, but the model may not behave identically to one initialized with a correct estimate.

Methods

get_action

Return an action given an observation.

learn

Call the model's learn function with the given parameters

update

Add new rewards and done information.

get_action(obs)[source]

Return an action given an observation.

The agent saves the last transition into its buffer. It also updates the model if the buffer is full.

Parameters:

obs (Observation) – The observation to use

Returns:

The action to take

Return type:

ndarray

learn(**kwargs)[source]

Call the model’s learn function with the given parameters

Return type:

None

update(reward, done)[source]

Add new rewards and done information.

The rewards are added to buffer entry corresponding to the most recent recorded action.

Parameters:
  • reward (float) – The reward receieved from the previous action step

  • done (bool) – Whether the game is done

Return type:

None