Batch-Active Preference-Based Learning of Reward Functions

In this post, we discuss an efficient way of reward learning. With a focus on preference-based learning methods, we show how sample-efficiency can be achieved along with computational efficiency by using batch-active methods. We practically analyze the tradeoff between informativeness and diversity within batch elements, and propose several methods that can provide a good balance. Lastly, we showcase our methods on several different simulators along with some usability studies.