A reinforcement learning from human feedback based method for task allocation of human robot collaboration assembly considering human preference

被引:1
作者
Wang, Jingfei [1 ]
Yan, Yan [1 ]
Hu, Yaoguang [1 ]
Yang, Xiaonan [1 ]
机构
[1] Beijing Inst Technol, Dept Mfg Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Human robot collaboration; Task allocation and sequencing; Reinforcement learning; Reinforcement learning from human feedback; OPTIMIZATION; FRAMEWORK;
D O I
10.1016/j.aei.2025.103497
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, human-robot collaboration is considered as an important enabling technology in human-centered manufacturing of industry 5.0. Reasonable task allocation and sequencing of human-robot collaboration process are necessary to fully utilize the strengths of workers and robots to improve workers' performance and experience. Although many human factors are considered in current studies of task allocation, it is difficult for workers to provide preferred choices and feedback to directly affect the decision-making due to the complexity of decision-making process, moreover, it may result in a solution that is not suitable for individual worker. To address this problem, a task allocation method based on human feedback reinforcement learning is proposed in this study. In this method, multi-agent reinforcement learning is applied to pre-train the agent models to solve the task allocation and sequencing problem with multiple optimal objectives. An analytic hierarchy process-based method is utilized to analyze human action preferences to build a heuristic reward model. Furthermore, a preference training approach using knowledge distillation is proposed, and agents are adjusted through preference rewards and pre-trained optimization experiences to learn a decision-making policy that suits worker preferences. The effectiveness of the method is verified in comparative and ablation experiments.
引用
收藏
页数:14
相关论文
共 46 条
[1]  
An Gaon, 2023, Advances in Neural Information Processing Systems
[2]   Active preference-based Gaussian process regression for reward learning and optimization [J].
Biyik, Erdem ;
Huynh, Nicolas ;
Kochenderfer, Mykel J. ;
Sadigh, Dorsa .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (05) :665-684
[3]   Optimised scheduling in human-robot collaboration - a use case in the assembly of printed circuit boards [J].
Bogner, Karin ;
Pferschy, Ulrich ;
Unterberger, Roland ;
Zeiner, Herwig .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2018, 56 (16) :5522-5540
[4]  
Campagna G., 2024, IEEE Rob. Autom. Lett.
[5]   Weak Human Preference Supervision for Deep Reinforcement Learning [J].
Cao, Zehong ;
Wong, KaiChiu ;
Lin, Chin-Teng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) :5369-5378
[6]  
Choi J, 2020, IEEE INT CONF ROBOT, P3363, DOI [10.1109/icra40945.2020.9197159, 10.1109/ICRA40945.2020.9197159]
[7]   Criteria for assessment of basic manual assembly complexity [J].
Falck, Ann-Christine ;
Ortengren, Roland ;
Rosenqvist, Mikael ;
Soderberg, Rikard .
6TH CIRP CONFERENCE ON ASSEMBLY TECHNOLOGIES AND SYSTEMS (CATS), 2016, 44 :424-428
[8]   A framework for human-robot collaboration enhanced by preference learning and ergonomics [J].
Falerni, Matteo Meregalli ;
Pomponi, Vincenzo ;
Karimi, Hamid Reza ;
Nicora, Matteo Lavit ;
Dao, Le Anh ;
Malosio, Matteo ;
Roveda, Loris .
ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 89
[9]   Collaborative robot task allocation on an assembly line using the decision support system [J].
Gjeldum, N. ;
Aljinovic, A. ;
Crnjac Zizic, M. ;
Mladineo, M. .
INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2022, 35 (4-5) :510-526
[10]  
Grigore EC, 2018, IEEE INT C INT ROBOT, P4441, DOI 10.1109/IROS.2018.8593716