pantheonrl.algos.bc.BC
- class BC(observation_space, action_space, *, policy_class=<class 'pantheonrl.common.util.FeedForward32Policy'>, policy_kwargs=None, expert_data=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, optimizer_kwargs=None, ent_weight=0.001, l2_weight=0.0, device='auto')[source]
Bases:
objectBehavioral cloning (BC).
Recovers a policy via supervised learning on observation-action Tensor pairs, sampled from a Torch DataLoader or any Iterator that ducktypes torch.utils.data.DataLoader. Args:
observation_space: the observation space of the environment. action_space: the action space of the environment. policy_class: used to instantiate imitation policy. policy_kwargs: keyword arguments passed to policy’s constructor. expert_data: If not None, then immediately call
self.set_expert_data_loader(expert_data) during initialization.
optimizer_cls: optimiser to use for supervised training. optimizer_kwargs: keyword arguments, excluding learning rate and
weight decay, for optimiser construction.
ent_weight: scaling applied to the policy’s entropy regularization. l2_weight: scaling applied to the policy’s L2 regularization. device: name/identity of device to place policy on.
Methods
Save policy to a patorch. Can be reloaded by .reconstruct_policy(). Args: policy_path: path to save policy to.
Set the expert data loader, which yields batches of obs-act pairs. Changing the expert data loader on-demand is useful for DAgger and other interactive algorithms. Args: expert_data: Either a Torch DataLoader, any other iterator that yields dictionaries containing "obs" and "acts" Tensors or Numpy arrays, or a TransitionsMinimal instance. If this is a TransitionsMinimal instance, then it is automatically converted into a shuffled DataLoader with batch size BC.DEFAULT_BATCH_SIZE.
Train with supervised learning for some number of epochs. Here an 'epoch' is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). Args: n_epochs: Number of complete passes made through expert data before ending training. Provide exactly one of n_epochs and n_batches. n_batches: Number of batches loaded from dataset before ending training. Provide exactly one of n_epochs and n_batches. on_epoch_end: Optional callback with no parameters to run at the end of each epoch. on_batch_end: Optional callback with no parameters to run at the end of each batch. log_interval: Log stats after every log_interval batches.
Attributes
Default batch size for DataLoader automatically constructed from Transitions.
- Parameters:
observation_space (Space) –
action_space (Space) –
policy_class (Type[BasePolicy]) –
policy_kwargs (Mapping[str, Any] | None) –
expert_data (Iterable[Mapping] | TransitionsMinimal | None) –
optimizer_cls (Type[Optimizer]) –
optimizer_kwargs (Dict[str, Any] | None) –
ent_weight (float) –
l2_weight (float) –
device (str | device) –
- DEFAULT_BATCH_SIZE: int = 32
Default batch size for DataLoader automatically constructed from Transitions. See set_expert_data_loader().
- save_policy(policy_path)[source]
Save policy to a patorch. Can be reloaded by .reconstruct_policy(). Args:
policy_path: path to save policy to.
- Parameters:
policy_path (str) –
- Return type:
None
- set_expert_data_loader(expert_data)[source]
Set the expert data loader, which yields batches of obs-act pairs. Changing the expert data loader on-demand is useful for DAgger and other interactive algorithms. Args:
- expert_data: Either a Torch DataLoader, any other iterator that
yields dictionaries containing “obs” and “acts” Tensors or Numpy arrays, or a TransitionsMinimal instance. If this is a TransitionsMinimal instance, then it is automatically converted into a shuffled DataLoader with batch size BC.DEFAULT_BATCH_SIZE.
- Parameters:
expert_data (Iterable[Mapping] | TransitionsMinimal) –
- Return type:
None
- train(*, n_epochs=None, n_batches=None, on_epoch_end=None, on_batch_end=None, log_interval=100)[source]
Train with supervised learning for some number of epochs. Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). Args:
- n_epochs: Number of complete passes made through expert data before
ending training. Provide exactly one of n_epochs and n_batches.
- n_batches: Number of batches loaded from dataset before ending
training. Provide exactly one of n_epochs and n_batches.
- on_epoch_end: Optional callback with no parameters to run at the
end of each epoch.
- on_batch_end: Optional callback with no parameters to run at the
end of each batch.
log_interval: Log stats after every log_interval batches.
- Parameters:
n_epochs (int | None) –
n_batches (int | None) –
on_epoch_end (Callable[[], None] | None) –
on_batch_end (Callable[[], None] | None) –
log_interval (int) –