pantheonrl.algos.modular.policies
Implementation of the policy for the ModularAlgorithm
Classes
Policy class for actor-critic algorithms (has both policy and value prediction). Used by A2C, PPO and the likes. :param observation_space: (gym.spaces.Space) Observation space :param action_space: (gym.spaces.Space) Action space :param lr_schedule: (Callable) Learning rate schedule (could be constant) :param net_arch: ([int or dict]) The specification of the policy and value networks. :param device: (str or torch.device) Device on which the code should run. :param activation_fn: (Type[nn.Module]) Activation function :param ortho_init: (bool) Whether to use or not orthogonal initialization :param use_sde: (bool) Whether to use State Dependent Exploration or not :param log_std_init: (float) Initial value for the log standard deviation :param full_std: (bool) Whether to use (n_features x n_actions) parameters for the std instead of only (n_features,) when using gSDE :param sde_net_arch: ([int]) Network architecture for extracting features when using gSDE. If None, the latent features from the policy will be used. Pass an empty list to use the states as features. :param use_expln: (bool) Use |