Learning Reward Functions by Integrating Human Demonstrations and Preferences

被引：0

作者：

Palan, Malayandi ^{[1
]}

Shevchuk, Gleb ^{[1
]}

Landolfi, Nicholas C. ^{[1
]}

Sadigh, Dorsa ^{[1
]}

机构：

[1] Stanford Univ, Comp Sci, Stanford, CA 94305 USA

来源：

ROBOTICS: SCIENCE AND SYSTEMS XV | 2019年

关键词：

D O I：

暂无

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.

引用

页数：10

共 50 条

[1] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
Biyik, Erdem
Losey, Dylan P.
Palan, Malayandi
Landolfi, Nicholas C.
Shevchuk, Gleb
Sadigh, Dorsa
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01): : 45 - 67
[2] Reward learning from human preferences and demonstrations in Atari
Ibarz, Borja
Leike, Jan
Pohlen, Tobias
Irving, Geoffrey
Legg, Shane
Amodei, Dario
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[3] Batch Active Learning of Reward Functions from Human Preferences
Biyik, Erdem
Anari, Nima
Sadigh, Dorsa
ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (02)
[4] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
Carreno-Medrano, Pamela
Smith, Stephen L.
Kulic, Dana
IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
[5] Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning
Huo, Liangyu
Wang, Zulin
Xu, Mai
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7953 - 7961
[6] Human2bot: learning zero-shot reward functions for robotic manipulation from human demonstrations
Yasir Salam
Yinbei Li
Jonas Herzog
Jiaqiang Yang
Autonomous Robots, 2025, 49 (2)
[7] Reward Learning from Narrated Demonstrations
Tung, Hsiao-Yu
Harley, Adam W.
Huang, Liang-Kang
Fragkiadaki, Katerina
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
[8] Reward Learning From Very Few Demonstrations
Eteke, Cem
Kebude, Dogancan
Akgun, Baris
IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) : 893 - 904
[9] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
Huang, Jie
Hao, Jiangshan
Juan, Rongshun
Gomez, Randy
Nakamura, Keisuke
Li, Guangliang
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690
[10] Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
Karimi, Zohre
Ho, Shing-Hei
Thach, Bao
Kuntz, Alan
Brown, Daniel S.
2024 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS, ISMR 2024, 2024,

← 1 2 3 4 5 →