The field of robotics has made significant advances over the past few decades, but the question of how we should treat humans (human designers, human operators, human users, or human observers) still remains: Should we assume humans are moving obstacles that can simply be avoided? Should we assume humans are rational agents who are most likely to take an optimal action?
A core challenge in developing safe interactive systems is predicting and anticipating how humans act and interact in response to robot actions. We focus on developing computational human models when common human modeling assumptions fail. For example, when we only have access to limited or imperfect data, or when humans act in near the end of the risk spectrum.
Active Learning of Reward Functions
Human preferences play a key role in specifying how robotics systems should act, i.e., how an assistive robot arm should move, or how an autonomous car should drive. However, a significant part of the success of reward learning algorithms can be attributed to the availability of large amounts of labeled data. Unfortunately, collecting and labeling data can be costly and time-consuming in most robotics applications. In addition, humans are not always capable of reliably assigning a success value (reward) to a given robot action, and their demonstrations are usually suboptimal due to the difficulty of operating robots with more than a few degrees of freedom. Our work develops active learning algorithms that efficiently query users for the most informative piece of data [RSS 2017, CoRL 2018, RSS 2019, CoRL 2019, IROS 2019, CDC 2019, RSS 2020a].
We study how to optimally integrate different sources of data from humans. Specifically, when learning from both expert demonstrations and active queries, we prove that the optimal integration is to warm-start the reward learning algorithm by learning from expert demonstrations, and fine-tune the model using active preferences.
In our work, we also consider: learning non-linear reward functions, batch active learning of reward functions, dynamically changing human reward functions, as well as optimizing for the ease of queries to enable a more reliable and intuitive interaction with humans.
Risk-Aware Human Models
Some of today’s robots model humans as if they were also robots, and assume users are always optimal. Other robots account for human limitations, and relax this assumption so that the human is assumed to be noisily rational. Both of these models make sense when the human receives deterministic rewards. But in real world scenarios, rewards are rarely deterministic. Instead, we consider settings, where these simplifying assumptions fail, and humans need to make choices subject to risk and uncertainty. In these settings, humans exhibit a cognitive bias towards suboptimal behavior. For example, when deciding between gaining 100 dollars with certainty or 130 dollars only 80% of the time, people tend to make the risk-averse choice even though it leads to a lower expected gain!
We adopt a well-known Risk-Aware human model from behavioral economics called Cumulative Prospect Theory and enable robots to leverage this model during human-robot interaction. Our work extends existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during interaction.
Collaborative Block Stacking. We demonstrate this risk-aware modeling and planning in a collaborative block stacking task. The robot is collaboratively building a tower with a human. The two can either build an efficient but unstable tower, or an inefficient but stable one. The robot here plans with two different models of the human: the noisily rational baseline and our Risk-Aware model. Planning with these models leads the robot to choose two different trajectories:
Aggressive but Rational. When the robot is using the noisily rational model, it immediately goes for the closer cup, since this behavior is more efficient. Put another way, the robot using the noisily rational model incorrectly anticipates that the human wants to make the efficient but unstable tower. This erroneous prediction causes the human and robot to clash, and the robot has to undo its mistake (as you can see in the video above).
Conservative and Risk-Aware. A Risk-Aware robot gets this prediction right: it correctly anticipates that the human is overly concerned about the tower falling, and starts to build the less efficient but stable tower. Having the right prediction here prevents the human and robot from reaching for the same cup, so that they more seamlessly collaborate during the task!
Our work integrates learning techniques along with modeling cognitive biases to anticipate human behavior in risk-sensitive scenarios, and better coordinate and collaborate with humans [RSS 2020b,HRI 2020].Incomplete List of Related Publications:
- Erdem Bıyık*, Nicolas Huynh*, Mykel J. Kochenderfer, Dorsa Sadigh. Active Preference-Based Gaussian Process Regression for Reward Learning. Proceedings of Robotics: Science and Systems (RSS), July 2020. [PDF]
- Minae Kwon, Erdem Bıyık, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh. When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans. ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2020. [PDF]
- Zhangjie Cao*, Erdem Bıyık*, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh. Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving.Proceedings of Robotics: Science and Systems (RSS), July 2020 [PDF]
- Chandrayee Basu, Erdem Bıyık, Zhixun He, Mukesh Singhal, Dorsa Sadigh. Active Learning of Reward Dynamics from Hierarchical Queries. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019. [PDF]
- Erdem Bıyık, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning. Proceedings of the 3rd Conference on Robot Learning (CoRL), October 2019. [PDF]
- Malayandi Palan*, Nicholas C. Landolfi*, Gleb Shevchuk, Dorsa Sadigh. Learning Reward Functions by Integrating Human Demonstrations and Preferences. Proceedings of Robotics: Science and Systems (RSS), June 2019. [PDF]
- Erdem Bıyık, Dorsa Sadigh. Batch Active Preference-Based Learning of Reward Functions. Proceedings of the 2nd Conference on Robot Learning (CoRL), October 2018. [PDF]
- Dorsa Sadigh, Anca D. Dragan, S. Shankar Sastry, Sanjit A. Seshia. Active Preference-Based Learning of Reward Functions. Proceedings of Robotics: Science and Systems (RSS), July 2017. [PDF]