Human skill knowledge guided global trajectory policy reinforcement learning method

被引：0

作者：

Zang, Yajing ^{[1
]}

Wang, Pengfei ^{[1
]}

Zha, Fusheng ^{[1
]}

Guo, Wei ^{[1
]}

Li, Chuanfeng ^{[2
]}

Sun, Lining ^{[1
]}

机构：

[1] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin, Peoples R China

[2] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin, Peoples R China

来源：

FRONTIERS IN NEUROROBOTICS | 2024年 / 18卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

path planning; imitation learning; reinforcement learning; behavioral cloning; probabilistic movement primitives;

D O I：

10.3389/fnbot.2024.1368243

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address this problem, a global trajectory learning method which combinines IL with Reinforcement Learning (RL) to adapt the knowledge policy to the environment is proposed. In this paper, IL is proposed to acquire basic trajectory skills, and then learns the agent will explore and exploit more policy which is applicable to the current environment by RL. The basic trajectory skills include the knowledge policy and the time stage information in the whole task space to help learn the time series of the trajectory, and are used to guide the subsequent RL process. Notably, neural networks are not used to model the action policy and the Q value of RL during the RL process. Instead, they are sampled and updated in the whole task space and then transferred to the networks after the RL process through Behavior Cloning (BC) to get continuous and smooth global trajectory policy. The feasibility and the effectiveness of the method was validated in a custom Gym environment of a flower drawing task. And then, we executed the learned policy in the real-world robot drawing experiment.

引用

页数：12

共 50 条

[31] Acceleration of Reinforcement Learning by Policy Evaluation Using Nonstationary Iterative Method [J].

Senda, Kei ;

Hattori, Suguru ;

Hishinuma, Toru ;

Kohda, Takehisa .

IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) :2696-2705

[32] Transferring knowledge from human-demonstration trajectories to reinforcement learning [J].

Wang, Guo-fang ;

Fang, Zhou ;

Li, Ping ;

Li, Bo .

TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2018, 40 (01) :94-101

[33] Transferring Human Manipulation Knowledge to Industrial Robots Using Reinforcement Learning [J].

Arana-Arexolaleiba, N. ;

Urrestilla-Anguiozar, N. ;

Chrysostomou, D. ;

Bogh, S. .

29TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING (FAIM 2019): BEYOND INDUSTRY 4.0: INDUSTRIAL ADVANCES, ENGINEERING EDUCATION AND INTELLIGENT MANUFACTURING, 2019, 38 :1508-1515

[34] A reinforcement learning guided adaptive cost-sensitive feature acquisition method [J].

An, Chaojie ;

Zhou, Qifeng ;

Yang, Shen .

APPLIED SOFT COMPUTING, 2022, 117

[35] Example-guided learning of stochastic human driving policies using deep reinforcement learning [J].

Ran Emuna ;

Rotem Duffney ;

Avinoam Borowsky ;

Armin Biess .

Neural Computing and Applications, 2023, 35 :16791-16804

[36] Example-guided learning of stochastic human driving policies using deep reinforcement learning [J].

Emuna, Ran ;

Duffney, Rotem ;

Borowsky, Avinoam ;

Biess, Armin .

NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23) :16791-16804

[37] A Reinforcement Learning Method to Trajectory Design for Manned Lunar Mission via Reshaping Rewards [J].

Yang, Luyi ;

Li, Haiyang ;

Li, Xingyong ;

Li, Zeyue ;

Lu, Lin .

ADVANCES IN GUIDANCE, NAVIGATION AND CONTROL, 2023, 845 :5318-5329

[38] Reinforcement learning method for the multi-objective speed trajectory optimization of a freight train [J].

Lin, Xuan ;

Liang, Zhicheng ;

Shen, Lijuan ;

Zhao, Fengyuan ;

Liu, Xinyu ;

Sun, Pengfei ;

Cao, Taiqiang .

CONTROL ENGINEERING PRACTICE, 2023, 138

[39] Trajectory Jerking Suppression for Mixed Traffic Flow at a Signalized Intersection: A Trajectory Prediction Based Deep Reinforcement Learning Method [J].

Wang, Shupei ;

Wang, Ziyang ;

Jiang, Rui ;

Yan, Ruidong ;

Du, Lei .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) :18989-19000

[40] Gantry Work Cell Scheduling through Reinforcement Learning with Knowledge-guided Reward Setting [J].

Ou, Xinyan ;

Chang, Qing ;

Arinez, Jorge ;

Zou, Jing .

IEEE ACCESS, 2018, 6 :14699-14709

← 1 2 3 4 5 →