Computational Models of Humans

It's difficult to collect data with real robots.
Machine learning has seen great success in domains where vast amounts of data are easily available, such as image-recognition and natural language processing; or where an accurate simulator is available for virtual data collection, such as in games like Go and DotA. In domains such as robotics, physical and practical constraints (such as time, cost, and human effort) limit the amount of available data, and despite recent progress, physics-based simulators are yet to reach the level of accuracy required to allow us to train robots in simulation and then deploy them in the real-world. In light of this fact, we develop active-learning methods.

Active Learning of Reward Functions

Through pairwise comparisons, we learn human's preference reward functions.
It is often difficult to provide good demonstrations to robots. To achieve reward learning, we take a preference-based learning approach where we learn the reward through comparison queries. To significantly improve data-efficiency, we formulate active learning optimization and synthesize near-optimal queries.

Batch Active Learning of Reward Functions

Batch methods give a balance between data and time efficiency.
Active learning algorithms suffer from excessive computation times, which makes them difficult to interact with humans. Using volume removal formulations and/or information entropy, we generate a batch of queries all together that are both diverse and informative. This makes the algorithms feasible for interaction with humans.

Incomplete List of Related Publications:
  • Dorsa Sadigh, Anca D. Dragan, S. Shankar Sastry, Sanjit A. Seshia. Active Preference-Based Learning of Reward Functions. Proceedings of Robotics: Science and Systems (RSS), July 2017. [PDF]
  • Erdem Bıyık, Dorsa Sadigh. Batch Active Preference-Based Learning of Reward Functions. Proceedings of the 2nd Conference on Robot Learning (CoRL), October 2018. [PDF]