Backtracking Exploration for Reinforcement Learning

被引:0
|
作者
Chen, Xingguo [1 ]
Chen, Zening [1 ]
Sun, Dingyuanhao [1 ]
Gao, Yang [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
来源
2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023 | 2023年
基金
中国国家自然科学基金;
关键词
backtracking; exploration; convergence speed;
D O I
10.1145/3627676.3627687
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploration of the behavior policy plays an important role in reinforcement learning as it helps learning algorithms escape local optima. Taking linear value function approximation as an example, exploration directly affects the sampling of states, thereby altering the distribution of states. This distribution is a component of the key matrix, and the magnitude of the smallest eigenvalue of the key matrix is proportional to the convergence speed. However, existing exploration methods are constrained by the MDP chain and require step-by-step backtracking to reach the target policy distribution. This paper breaks the assumption that the action settings of the training environment must be identical to that of the testing environment by introducing state resetting in the training environment and proposes a backtracking exploration algorithm with time window and punishment. This algorithm can be directly combined with existing exploration strategies and value function update rules, and it has the potential to become a new paradigm for the training process in reinforcement learning. Experimental results validate the effectiveness of the proposed algorithm.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] RLBS: An Adaptive Backtracking Strategy Based on Reinforcement Learning for Combinatorial Optimization
    Bachiri, Ilyess
    Gaudreault, Jonathan
    Quimper, Claude-Guy
    Chaib-draa, Brahim
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 936 - 942
  • [2] Learning-Driven Exploration for Reinforcement Learning
    Usama, Muhammad
    Chang, Dong Eui
    2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1146 - 1151
  • [3] Exploration in deep reinforcement learning: A survey
    Ladosz, Pawel
    Weng, Lilian
    Kim, Minwoo
    Oh, Hyondong
    INFORMATION FUSION, 2022, 85 : 1 - 22
  • [4] Adaptive Exploration Strategies for Reinforcement Learning
    Hwang, Kao-Shing
    Li, Chih-Wen
    Jiang, Wei-Cheng
    2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 16 - 19
  • [5] Improving Reinforcement Learning Exploration by Autoencoders
    Paczolay, Gabor
    Harmati, Istvan
    Periodica Polytechnica Electrical Engineering and Computer Science, 2024, 68 (04): : 335 - 343
  • [6] Reinforcement Learning with Derivative-Free Exploration
    Chen, Xiong-Hui
    Yu, Yang
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1880 - 1882
  • [7] Exploration With Task Information for Meta Reinforcement Learning
    Jiang, Peng
    Song, Shiji
    Huang, Gao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4033 - 4046
  • [8] Intrinsically Motivated Lifelong Exploration in Reinforcement Learning
    Bougie, Nicolas
    Ichise, Ryutaro
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 1357 : 109 - 120
  • [9] Softmax exploration strategies for multiobjective reinforcement learning
    Vamplew, Peter
    Dazeley, Richard
    Foale, Cameron
    NEUROCOMPUTING, 2017, 263 : 74 - 86
  • [10] Learning to soar: Resource-constrained exploration in reinforcement learning
    Chung, Jen Jen
    Lawrance, Nicholas R. J.
    Sukkarieh, Salah
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (02) : 158 - 172