Style-Based Reinforcement Learning: Task Decoupling Personalization for Human-Robot Collaboration

被引:1
作者
Bonyani, Mandi [1 ]
Soleymani, Maryam [1 ]
Wang, Chao [1 ]
机构
[1] Louisiana State Univ, Bert S Turner Dept Construct Management, Baton Rouge, LA 70803 USA
来源
UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION, PT I, UAHCI 2024 | 2024年 / 14696卷
基金
美国国家科学基金会;
关键词
Human-Robot Collaboration; Task Decoupling Personalization; Reinforcement Learning;
D O I
10.1007/978-3-031-60875-9_13
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Intelligent robots that are intended to engage with people in real life must be able to adjust to the varying tastes of their users. Robots can be taught personalized behaviors through human-robot collaboration without the need for a laborious, hand-crafted reward function. Instead, robots can learn rewards based on human styles between two robot movements that are called style-based reinforcement learning (SRL). However, existing SRL algorithms suffer from low exploration in the reward and state spaces, low feedback efficiency, and poor performance in complicated interactive tasks. We incorporate past information of the activity into SRL in order to enhance its result. In particular, we separate the activity in human-robot collaboration from the style. We employ an imprecise task reward based on task priori to guide robots in performing more efficient task exploration. Next, the robot's policy is optimized using a learned reward from SRL to better match human styles. Additionally, reward shaping allows for the organic fusion of these two components. The outcomes of the experiment demonstrate that our approach is a workable and efficient means of achieving customized human-robot collaboration.
引用
收藏
页码:197 / 212
页数:16
相关论文
共 29 条
[1]  
Akrour Riad, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P116, DOI 10.1007/978-3-642-33486-3_8
[2]  
BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324
[3]   Weak Human Preference Supervision for Deep Reinforcement Learning [J].
Cao, Zehong ;
Wong, KaiChiu ;
Lin, Chin-Teng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) :5369-5378
[4]  
Cha J, 2021, ADV NEUR IN
[5]  
Christiano PF, 2017, ADV NEUR IN, V30
[6]   Robots for the people, by the people: Personalizing human-machine interaction [J].
Clabaugh, Caitlyn ;
Mataric, Maja .
SCIENCE ROBOTICS, 2018, 3 (21)
[7]  
Erickson Zackory, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10169, DOI 10.1109/ICRA40945.2020.9197411
[8]   Preference-based reinforcement learning: a formal framework and a policy iteration algorithm [J].
Fuernkranz, Johannes ;
Huellermeier, Eyke ;
Cheng, Weiwei ;
Park, Sang-Hyeun .
MACHINE LEARNING, 2012, 89 (1-2) :123-156
[9]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[10]   Learning agile and dynamic motor skills for legged robots [J].
Hwangbo, Jemin ;
Lee, Joonho ;
Dosovitskiy, Alexey ;
Bellicoso, Dario ;
Tsounis, Vassilios ;
Koltun, Vladlen ;
Hutter, Marco .
SCIENCE ROBOTICS, 2019, 4 (26)