pantheonrl.common.agents.OffPolicyAgent

class OffPolicyAgent(model, log_interval=None, working_timesteps=1000, callback=None, tb_log_name='OffPolicyAgent')[source]

Bases: Agent

Agent representing an off-policy learning algorithm (ex: DQN/SAC).

The get_action and update functions are based on the learn function from OffPolicyAlgorithm.

Parameters:

model (OffPolicyAlgorithm) – Model representing the agent’s learning algorithm
log_interval – Optional log interval for policy logging
working_timesteps – Estimate for number of timesteps to train for.
callback – Optional callback fed into the OffPolicyAlgorithm
tb_log_name – Name for tensorboard log

Warning

Note that the model will still continue training beyond the working_timesteps point, but the model may not behave identically to one initialized with a correct estimate.

Methods

`get_action`	Return an action given an observation.
`learn`	Call the model's learn function with the given parameters
`update`	Add new rewards and done information.

get_action(obs)[source]

Return an action given an observation.

This function may also update the agent during training

Parameters:: obs (Observation) – The observation to use
Returns:: The action to take
Return type:: ndarray

learn(**kwargs)[source]

Call the model’s learn function with the given parameters

Return type:: None

update(reward, done)[source]

Add new rewards and done information.

The agent trains when the model determines that it has collected enough timesteps.

Parameters:

reward (float) – The reward receieved from the previous action step
done (bool) – Whether the game is done

Return type:

None