pantheonrl.algos.bc

Behavioural Cloning (BC). Trains policy by applying supervised learning to a fixed dataset of (observation, action) pairs generated by some expert demonstrator.

https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/algorithms/bc.py

Functions

reconstruct_policy

Reconstruct a saved policy. Args: policy_path: path where .save_policy() has been run. device: device on which to load the policy. Returns: policy: policy with reloaded weights.

Classes

BC

Behavioral cloning (BC).

BCShell

Shell class for BC policy

ConstantLRSchedule

A callable that returns a constant learning rate.

EpochOrBatchIteratorWithProgress

Wraps DataLoader so that all BC batches can be processed in a one for-loop. Also uses tqdm to show progress in stdout. Args: data_loader: An iterable over data dicts, as used in BC. n_epochs: The number of epochs to iterate through in one call to __iter__. Exactly one of n_epochs and n_batches should be provided. n_batches: The number of batches to iterate through in one call to __iter__. Exactly one of n_epochs and n_batches should be provided. on_epoch_end: A callback function without parameters to be called at the end of every epoch. on_batch_end: A callback function without parameters to be called at the end of every batch.