Learning Reward Functions by Integrating Human Demonstrations and Preferences

被引:0
|
作者
Palan, Malayandi [1 ]
Shevchuk, Gleb [1 ]
Landolfi, Nicholas C. [1 ]
Sadigh, Dorsa [1 ]
机构
[1] Stanford Univ, Comp Sci, Stanford, CA 94305 USA
来源
ROBOTICS: SCIENCE AND SYSTEMS XV | 2019年
关键词
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
    Biyik, Erdem
    Losey, Dylan P.
    Palan, Malayandi
    Landolfi, Nicholas C.
    Shevchuk, Gleb
    Sadigh, Dorsa
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01): : 45 - 67
  • [2] Reward learning from human preferences and demonstrations in Atari
    Ibarz, Borja
    Leike, Jan
    Pohlen, Tobias
    Irving, Geoffrey
    Legg, Shane
    Amodei, Dario
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Batch Active Learning of Reward Functions from Human Preferences
    Biyik, Erdem
    Anari, Nima
    Sadigh, Dorsa
    ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (02)
  • [4] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
    Carreno-Medrano, Pamela
    Smith, Stephen L.
    Kulic, Dana
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
  • [5] Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning
    Huo, Liangyu
    Wang, Zulin
    Xu, Mai
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7953 - 7961
  • [6] Human2bot: learning zero-shot reward functions for robotic manipulation from human demonstrations
    Yasir Salam
    Yinbei Li
    Jonas Herzog
    Jiaqiang Yang
    Autonomous Robots, 2025, 49 (2)
  • [7] Reward Learning from Narrated Demonstrations
    Tung, Hsiao-Yu
    Harley, Adam W.
    Huang, Liang-Kang
    Fragkiadaki, Katerina
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
  • [8] Reward Learning From Very Few Demonstrations
    Eteke, Cem
    Kebude, Dogancan
    Akgun, Baris
    IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) : 893 - 904
  • [9] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
    Huang, Jie
    Hao, Jiangshan
    Juan, Rongshun
    Gomez, Randy
    Nakamura, Keisuke
    Li, Guangliang
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690
  • [10] Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
    Karimi, Zohre
    Ho, Shing-Hei
    Thach, Bao
    Kuntz, Alan
    Brown, Daniel S.
    2024 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS, ISMR 2024, 2024,