PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models

被引：0

作者：

Wang, Ruiqi ^{[1
]}

Zhao, Dezhong ^{[1
,2
]}

Yuan, Ziqin ^{[1
]}

Obi, Ike ^{[1
]}

Min, Byung-Cheol ^{[1
]}

机构：

[1] Purdue Univ, Dept Comp & Informat Technol, SMART Lab, W Lafayette, IN 47907 USA

[2] Beijing Univ Chem Technol, Coll Mech & Elect Engn, Beijing 100020, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2025年 / 10卷 / 03期

关键词：

Robots; Trajectory; Human-robot interaction; Crowdsourcing; Collective intelligence; Robot learning; Reinforcement learning; Large language models; Benchmark testing; Vectors; large language model; preference-based reinforcement learning;

D O I：

10.1109/LRA.2025.3528663

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback without complex reward engineering. However, the substantial volume of human feedback required hinders broader applications. In this work, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as synthetic teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preference beliefs from multiple LLM agents at the score level, efficiently leveraging their diversity and collective intelligence. We also introduce a human-in-the-loop pipeline, enabling iterative and collective refinements that adapt to the nuanced and individualized preferences inherent to human-robot interaction (HRI) scenarios. Experimental results across various general RL tasks show that PrefCLM achieves competitive performance compared to expert-engineered scripted teachers and excels in facilitating more natural and efficient behaviors. A real-world user study (N = 10) further demonstrates its capability to tailor robot behaviors to individual user preferences, enhancing user satisfaction in HRI scenarios.

引用

页码：2486 / 2493

页数：8

共 33 条

[1]

2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]

[2] The Conflicts of the Faculty [J].

Sahlins, Marshall .

CRITICAL INQUIRY, 2009, 35 (04) :997-1017

[3]

Chan C.-M., 2023, P 12 INT C LEARN REP

[4]

Chen L., 2024, P 38 ANN C NEUR INF

[5]

Chhan D, 2024, Arxiv, DOI [arXiv:2401.10941, DOI 10.48550/ARXIV.2401.10941]

[6]

Christiano PF, 2017, ADV NEUR IN, V30

[7]

Erickson Zackory, 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA), P10169, DOI 10.1109/ICRA40945.2020.9197411

[8] Factors for Personalization and Localization to Optimize Human-Robot Interaction: A Literature Review [J].

Gasteiger, Norina ;

Hellou, Mehdi ;

Ahn, Ho Seok .

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2023, 15 (04) :689-701

[9]

Hadfield-Menell D, 2017, ADV NEUR IN, V30

[10]

Hong S., 2023, P 12 INT C LEARN REP

← 1 2 3 4 →