Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay

被引：1

作者：

Liu, Chanjuan ^{[1
]}

Cong, Jinmiao ^{[2
]}

Liu, Guangyuan ^{[2
]}

Jiang, Guifei ^{[3
]}

Xu, Xirong ^{[2
]}

Zhu, Enqiang ^{[4
]}

机构：

[1] Dalian Univ Technol, Canc Hosp, Sch Comp Sci & Technol, Dalian 116024, Peoples R China

[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China

[3] Nankai Univ, Coll Software, Tianjin 300350, Peoples R China

[4] Guangzhou Univ, Inst Comp Sci & Technol, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

关键词：

Curse of dimensionality; game theory; hierarchical reinforcement learning (HRL); sampling efficiency; GO; NETWORKS;

D O I：

10.1109/TNNLS.2024.3386717

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to its wide application, deep reinforcement learning (DRL) has been extensively studied in the motion planning community in recent years. However, in the current DRL research, regardless of task completion, the state information of the agent will be reset afterward. This leads to a low sample utilization rate and hinders further explorations of the environment. Moreover, in the initial training stage, the agent has a weak learning ability in general, which affects the training efficiency in complex tasks. In this study, a new hierarchical reinforcement learning (HRL) framework dubbed hierarchical learning based on game playing with state relay (HGR) is proposed. In particular, we introduce an auxiliary penalty to regulate task difficulty, and one training mechanism, the state relay mechanism, is designed. The relay mechanism can make full use of the intermediate states of the agent and expand the environment exploration of low-level policy. Our algorithm can improve the sample utilization rate, reduce the sparse reward problem, and thereby enhance the training performance in complex environments. Simulation tests are carried out on two public experiment platforms, i.e., MazeBase and MuJoCo, to verify the effectiveness of the proposed method. The results show that HGR significantly benefits the reinforcement learning (RL) area.

引用

页码：1 / 13

页数：13

共 43 条

[1] Driver Modeling Through Deep Reinforcement Learning and Behavioral Game Theory
Albaba, Berat Mert
Yildiz, Yildiray
[J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (02) : 885 - 892
[2] Andrychowicz M., 2017, arXiv
[3] Reinforcement Learning From Hierarchical Critics
Cao, Zehong
Lin, Chin-Teng
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1066 - 1073
[4] Cong Jinmiao, 2024, PRICAI 2023: Trends in Artificial Intelligence: 20th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2023, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (14325), P164, DOI 10.1007/978-981-99-7019-3_17
[5] Interpretable Reinforcement Learning with Multilevel Subgoal Discovery
Demin, Alexander
Ponomaryov, Denis
[J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 251 - 258
[6] Hierarchical reinforcement learning with the MAXQ value function decomposition
Dietterich, TG
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 : 227 - 303
[7] Florensa C., 2017, PROC 5 INT C LEARNRE, P1
[8] Florensa Carlos, 2018, INT C MACH LEARN
[9] Florensa Carlos, 2017, C ROB LEARN, P482, DOI DOI 10.1080/00908319208908727
[10] Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control
Jin, Yue
Wei, Shuangqing
Yuan, Jian
Zhang, Xudong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 90 - 103

← 1 2 3 4 5 →