pantheonrl.algos.bc.BC

class BC(observation_space, action_space, *, policy_class=<class 'pantheonrl.common.util.FeedForward32Policy'>, policy_kwargs=None, expert_data=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, optimizer_kwargs=None, ent_weight=0.001, l2_weight=0.0, device='auto')[source]

Bases: object

Behavioral cloning (BC).

Recovers a policy via supervised learning on observation-action Tensor pairs, sampled from a Torch DataLoader or any Iterator that ducktypes torch.utils.data.DataLoader. Args:

observation_space: the observation space of the environment. action_space: the action space of the environment. policy_class: used to instantiate imitation policy. policy_kwargs: keyword arguments passed to policy’s constructor. expert_data: If not None, then immediately call

self.set_expert_data_loader(expert_data) during initialization.

optimizer_cls: optimiser to use for supervised training. optimizer_kwargs: keyword arguments, excluding learning rate and

weight decay, for optimiser construction.

ent_weight: scaling applied to the policy’s entropy regularization. l2_weight: scaling applied to the policy’s L2 regularization. device: name/identity of device to place policy on.

Methods

save_policy

Save policy to a patorch. Can be reloaded by .reconstruct_policy(). Args: policy_path: path to save policy to.

set_expert_data_loader

Set the expert data loader, which yields batches of obs-act pairs. Changing the expert data loader on-demand is useful for DAgger and other interactive algorithms. Args: expert_data: Either a Torch DataLoader, any other iterator that yields dictionaries containing "obs" and "acts" Tensors or Numpy arrays, or a TransitionsMinimal instance. If this is a TransitionsMinimal instance, then it is automatically converted into a shuffled DataLoader with batch size BC.DEFAULT_BATCH_SIZE.

train

Train with supervised learning for some number of epochs. Here an 'epoch' is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). Args: n_epochs: Number of complete passes made through expert data before ending training. Provide exactly one of n_epochs and n_batches. n_batches: Number of batches loaded from dataset before ending training. Provide exactly one of n_epochs and n_batches. on_epoch_end: Optional callback with no parameters to run at the end of each epoch. on_batch_end: Optional callback with no parameters to run at the end of each batch. log_interval: Log stats after every log_interval batches.

Attributes

DEFAULT_BATCH_SIZE

Default batch size for DataLoader automatically constructed from Transitions.

Parameters:
  • observation_space (Space) –

  • action_space (Space) –

  • policy_class (Type[BasePolicy]) –

  • policy_kwargs (Mapping[str, Any] | None) –

  • expert_data (Iterable[Mapping] | TransitionsMinimal | None) –

  • optimizer_cls (Type[Optimizer]) –

  • optimizer_kwargs (Dict[str, Any] | None) –

  • ent_weight (float) –

  • l2_weight (float) –

  • device (str | device) –

DEFAULT_BATCH_SIZE: int = 32

Default batch size for DataLoader automatically constructed from Transitions. See set_expert_data_loader().

save_policy(policy_path)[source]

Save policy to a patorch. Can be reloaded by .reconstruct_policy(). Args:

policy_path: path to save policy to.

Parameters:

policy_path (str) –

Return type:

None

set_expert_data_loader(expert_data)[source]

Set the expert data loader, which yields batches of obs-act pairs. Changing the expert data loader on-demand is useful for DAgger and other interactive algorithms. Args:

expert_data: Either a Torch DataLoader, any other iterator that

yields dictionaries containing “obs” and “acts” Tensors or Numpy arrays, or a TransitionsMinimal instance. If this is a TransitionsMinimal instance, then it is automatically converted into a shuffled DataLoader with batch size BC.DEFAULT_BATCH_SIZE.

Parameters:

expert_data (Iterable[Mapping] | TransitionsMinimal) –

Return type:

None

train(*, n_epochs=None, n_batches=None, on_epoch_end=None, on_batch_end=None, log_interval=100)[source]

Train with supervised learning for some number of epochs. Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). Args:

n_epochs: Number of complete passes made through expert data before

ending training. Provide exactly one of n_epochs and n_batches.

n_batches: Number of batches loaded from dataset before ending

training. Provide exactly one of n_epochs and n_batches.

on_epoch_end: Optional callback with no parameters to run at the

end of each epoch.

on_batch_end: Optional callback with no parameters to run at the

end of each batch.

log_interval: Log stats after every log_interval batches.

Parameters:
  • n_epochs (int | None) –

  • n_batches (int | None) –

  • on_epoch_end (Callable[[], None] | None) –

  • on_batch_end (Callable[[], None] | None) –

  • log_interval (int) –