Learning Reward Functions by Integrating Human Demonstrations and Preferences
When learning from humans, we typically use data from only one form of human feedback. In this work, we investigate whether we can leverage data from multiple modes of feedback to learn more effectively from humans.
Altruistic Autonomy: Beating Congestion on Shared Roads
We develop a mathematical model to analyze the effects of autonomous cars on traffic congestion. We present new notions of equilibria for shared roads where autonomous cars are present and are possibly altruistic. Our realistic simulations show that autonomous cars can half the latency, and altruism halves it again.
Batch-Active Preference-Based Learning of Reward Functions
In this post, we discuss an efficient way of reward learning. With a focus on preference-based learning methods, we show how sample-efficiency can be achieved along with computational efficiency by using batch-active methods. We practically analyze the tradeoff between informativeness and diversity within batch elements, and propose several methods that can provide a good balance. Lastly, we showcase our methods on several different simulators along with some usability studies.