Backtracking Exploration for Reinforcement Learning

被引：0

作者：

Chen, Xingguo ^{[1
]}

Chen, Zening ^{[1
]}

Sun, Dingyuanhao ^{[1
]}

Gao, Yang ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing, Jiangsu, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China

来源：

2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

backtracking; exploration; convergence speed;

D O I：

10.1145/3627676.3627687

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Exploration of the behavior policy plays an important role in reinforcement learning as it helps learning algorithms escape local optima. Taking linear value function approximation as an example, exploration directly affects the sampling of states, thereby altering the distribution of states. This distribution is a component of the key matrix, and the magnitude of the smallest eigenvalue of the key matrix is proportional to the convergence speed. However, existing exploration methods are constrained by the MDP chain and require step-by-step backtracking to reach the target policy distribution. This paper breaks the assumption that the action settings of the training environment must be identical to that of the testing environment by introducing state resetting in the training environment and proposes a backtracking exploration algorithm with time window and punishment. This algorithm can be directly combined with existing exploration strategies and value function update rules, and it has the potential to become a new paradigm for the training process in reinforcement learning. Experimental results validate the effectiveness of the proposed algorithm.

引用

页数：7

共 50 条

[1] RLBS: An Adaptive Backtracking Strategy Based on Reinforcement Learning for Combinatorial Optimization
Bachiri, Ilyess
Gaudreault, Jonathan
Quimper, Claude-Guy
Chaib-draa, Brahim
2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 936 - 942
[2] Learning-Driven Exploration for Reinforcement Learning
Usama, Muhammad
Chang, Dong Eui
2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1146 - 1151
[3] Exploration in deep reinforcement learning: A survey
Ladosz, Pawel
Weng, Lilian
Kim, Minwoo
Oh, Hyondong
INFORMATION FUSION, 2022, 85 : 1 - 22
[4] Adaptive Exploration Strategies for Reinforcement Learning
Hwang, Kao-Shing
Li, Chih-Wen
Jiang, Wei-Cheng
2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 16 - 19
[5] Improving Reinforcement Learning Exploration by Autoencoders
Paczolay, Gabor
Harmati, Istvan
Periodica Polytechnica Electrical Engineering and Computer Science, 2024, 68 (04): : 335 - 343
[6] Reinforcement Learning with Derivative-Free Exploration
Chen, Xiong-Hui
Yu, Yang
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1880 - 1882
[7] Exploration With Task Information for Meta Reinforcement Learning
Jiang, Peng
Song, Shiji
Huang, Gao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4033 - 4046
[8] Intrinsically Motivated Lifelong Exploration in Reinforcement Learning
Bougie, Nicolas
Ichise, Ryutaro
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 1357 : 109 - 120
[9] Softmax exploration strategies for multiobjective reinforcement learning
Vamplew, Peter
Dazeley, Richard
Foale, Cameron
NEUROCOMPUTING, 2017, 263 : 74 - 86
[10] Learning to soar: Resource-constrained exploration in reinforcement learning
Chung, Jen Jen
Lawrance, Nicholas R. J.
Sukkarieh, Salah
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (02) : 158 - 172

← 1 2 3 4 5 →