Human skill knowledge guided global trajectory policy reinforcement learning method

被引:0
|
作者
Zang, Yajing [1 ]
Wang, Pengfei [1 ]
Zha, Fusheng [1 ]
Guo, Wei [1 ]
Li, Chuanfeng [2 ]
Sun, Lining [1 ]
机构
[1] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin, Peoples R China
[2] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
path planning; imitation learning; reinforcement learning; behavioral cloning; probabilistic movement primitives;
D O I
10.3389/fnbot.2024.1368243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address this problem, a global trajectory learning method which combinines IL with Reinforcement Learning (RL) to adapt the knowledge policy to the environment is proposed. In this paper, IL is proposed to acquire basic trajectory skills, and then learns the agent will explore and exploit more policy which is applicable to the current environment by RL. The basic trajectory skills include the knowledge policy and the time stage information in the whole task space to help learn the time series of the trajectory, and are used to guide the subsequent RL process. Notably, neural networks are not used to model the action policy and the Q value of RL during the RL process. Instead, they are sampled and updated in the whole task space and then transferred to the networks after the RL process through Behavior Cloning (BC) to get continuous and smooth global trajectory policy. The feasibility and the effectiveness of the method was validated in a custom Gym environment of a flower drawing task. And then, we executed the learned policy in the real-world robot drawing experiment.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A knowledge-guided reinforcement learning method for lateral path tracking
    Hu, Bo
    Zhang, Sunan
    Feng, Yuxiang
    Li, Bingbing
    Sun, Hao
    Chen, Mingyang
    Zhuang, Weichao
    Zhang, Yi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [2] A Guided-to-Autonomous Policy Learning method of Deep Reinforcement Learning in Path Planning
    Zhao, Wang
    Zhang, Ye
    Li, Haoyu
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 665 - 672
  • [3] Knowledge guided fuzzy deep reinforcement learning
    Qin, Peng
    Zhao, Tao
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [4] Policy Learning with Human Reinforcement
    Hwang, Kao-Shing
    Lin, Jin-Ling
    Shi, Haobin
    Chen, Yu-Ying
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2016, 18 (04) : 618 - 629
  • [5] Policy Learning with Human Reinforcement
    Kao-Shing Hwang
    Jin-Ling Lin
    Haobin Shi
    Yu-Ying Chen
    International Journal of Fuzzy Systems, 2016, 18 : 618 - 629
  • [6] Global structure of policy search spaces for reinforcement learning
    Stapelberg, B.
    Malan, K. M.
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 1773 - 1781
  • [7] Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation
    Chen, Xiaocong
    Huang, Chaoran
    Yao, Lina
    Wang, Xianzhi
    Liu, Wei
    Zhang, Wenjie
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] A knowledge-guided process planning approach with reinforcement learning
    Zhang, Lijun
    Wu, Hongjin
    Chen, Yelin
    Wang, Xuesong
    Peng, Yibing
    JOURNAL OF ENGINEERING DESIGN, 2024,
  • [9] Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
    Kim, Man-Je
    Park, Hyunsoo
    Ahn, Chang Wook
    ELECTRONICS, 2022, 11 (07)
  • [10] Robotic trajectory tracking control method based on reinforcement learning
    Liu W.
    Xing G.
    Chen H.
    Sun H.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2018, 24 (08): : 1996 - 2004