PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models

被引:0
作者
Wang, Ruiqi [1 ]
Zhao, Dezhong [1 ,2 ]
Yuan, Ziqin [1 ]
Obi, Ike [1 ]
Min, Byung-Cheol [1 ]
机构
[1] Purdue Univ, Dept Comp & Informat Technol, SMART Lab, W Lafayette, IN 47907 USA
[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Beijing 100020, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2025年 / 10卷 / 03期
关键词
Robots; Trajectory; Human-robot interaction; Crowdsourcing; Collective intelligence; Robot learning; Reinforcement learning; Large language models; Benchmark testing; Vectors; large language model; preference-based reinforcement learning;
D O I
10.1109/LRA.2025.3528663
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback without complex reward engineering. However, the substantial volume of human feedback required hinders broader applications. In this work, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as synthetic teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preference beliefs from multiple LLM agents at the score level, efficiently leveraging their diversity and collective intelligence. We also introduce a human-in-the-loop pipeline, enabling iterative and collective refinements that adapt to the nuanced and individualized preferences inherent to human-robot interaction (HRI) scenarios. Experimental results across various general RL tasks show that PrefCLM achieves competitive performance compared to expert-engineered scripted teachers and excels in facilitating more natural and efficient behaviors. A real-world user study (N = 10) further demonstrates its capability to tailor robot behaviors to individual user preferences, enhancing user satisfaction in HRI scenarios.
引用
收藏
页码:2486 / 2493
页数:8
相关论文
共 33 条
[1]  
2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]   The Conflicts of the Faculty [J].
Sahlins, Marshall .
CRITICAL INQUIRY, 2009, 35 (04) :997-1017
[3]  
Chan C.-M., 2023, P 12 INT C LEARN REP
[4]  
Chen L., 2024, P 38 ANN C NEUR INF
[5]  
Chhan D, 2024, Arxiv, DOI [arXiv:2401.10941, DOI 10.48550/ARXIV.2401.10941]
[6]  
Christiano PF, 2017, ADV NEUR IN, V30
[7]  
Erickson Zackory, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10169, DOI 10.1109/ICRA40945.2020.9197411
[8]   Factors for Personalization and Localization to Optimize Human-Robot Interaction: A Literature Review [J].
Gasteiger, Norina ;
Hellou, Mehdi ;
Ahn, Ho Seok .
INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2023, 15 (04) :689-701
[9]  
Hadfield-Menell D, 2017, ADV NEUR IN, V30
[10]  
Hong S., 2023, P 12 INT C LEARN REP