Computational Models of Humans

It's difficult to collect data with real robots.
Machine learning has seen great success in domains where vast amounts of data are easily available, such as image-recognition and natural language processing; or where an accurate simulator is available for virtual data collection, such as in games like Go and DotA. In domains such as robotics, physical and practical constraints (such as time, cost, and human effort) limit the amount of available data, and despite recent progress, physics-based simulators are yet to reach the level of accuracy required to allow us to train robots in simulation and then deploy them in the real-world. In light of this fact, we develop active-learning methods.

Active Learning of Reward Functions

Through pairwise comparisons, we learn human's preference reward functions.
It is often difficult to provide good demonstrations to robots. To achieve reward learning, we take a preference-based learning approach where we learn the reward through comparison queries. To significantly improve data-efficiency, we formulate active learning optimization and synthesize near-optimal queries.

Batch Active Learning of Reward Functions

Batch methods give a balance between data and time efficiency.
Active learning algorithms suffer from excessive computation times, which makes them difficult to interact with humans. Using volume removal formulations and/or information entropy, we generate a batch of queries all together that are both diverse and informative. This makes the algorithms feasible for interaction with humans.

Incomplete List of Related Publications:
  • Erdem Bıyık*, Nicolas Huynh*, Mykel J. Kochenderfer, Dorsa Sadigh. Active Preference-Based Gaussian Process Regression for Reward Learning. Proceedings of Robotics: Science and Systems (RSS), July 2020. [PDF]
  • Minae Kwon, Erdem Bıyık, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh. When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans. ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2020. [PDF]
  • Chandrayee Basu, Erdem Bıyık, Zhixun He, Mukesh Singhal, Dorsa Sadigh. Active Learning of Reward Dynamics from Hierarchical Queries. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019. [PDF]
  • Erdem Bıyık, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning. Proceedings of the 3rd Conference on Robot Learning (CoRL), October 2019. [PDF]
  • Malayandi Palan*, Nicholas C. Landolfi*, Gleb Shevchuk, Dorsa Sadigh. Learning Reward Functions by Integrating Human Demonstrations and Preferences. Proceedings of Robotics: Science and Systems (RSS), June 2019. [PDF]
  • Erdem Bıyık, Dorsa Sadigh. Batch Active Preference-Based Learning of Reward Functions. Proceedings of the 2nd Conference on Robot Learning (CoRL), October 2018. [PDF]
  • Dorsa Sadigh, Anca D. Dragan, S. Shankar Sastry, Sanjit A. Seshia. Active Preference-Based Learning of Reward Functions. Proceedings of Robotics: Science and Systems (RSS), July 2017. [PDF]