Balancing exploration and exploitation in episodic reinforcement learning

被引:2
|
作者
Chen, Qihang [1 ]
Zhang, Qiwei [1 ]
Liu, Yunlong [1 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
关键词
Reinforcement learning; Episodic tasks; Sparse and delayed rewards; Exploration and exploitation; Entropy-based intrinsic incentives;
D O I
10.1016/j.eswa.2023.120801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major challenges in reinforcement learning (RL) is its applications in episodic tasks, such as chess game, molecular structure design, healthcare, among others, where the rewards in such scenarios are usually sparse and can only be obtained at the end of an episode. The challenges posed by such episodic RL tasks place stringent demands on the exploration and credit assignment capabilities of the agent. In the current literature, many techniques have been presented to address these two issues, for example, various exploration methods have been proposed to increase the exploration ability of the agents to obtain diverse experience samples, and for the delayed reward problem, reward redistribution methods have provided dense task-oriented guidance to the agents by reshaping the sparse and delayed environmental rewards with the assistance of the episodic feedback. Although some successes have been achieved, with current existing techniques, the agents are usually unable to quickly assign credits to the explored key transitions or the related methods are prone to be misled by behavioral policies that fall into local optima and lead to sluggish learning efficiency. To alleviate inefficient learning due to sparse and delayed rewards, we propose a guided reward approach, namely Exploratory Intrinsic with Mission Guidance Reward (EMR), which organically combines intrinsic rewards of exploration mechanisms with reward redistribution in RL to balance exploration and exploitation of RL agents in such tasks. By using entropy-based intrinsic incentives and a simple uniform reward redistribution method, EMR will enable an agent with both the strong exploration and exploitation capability to efficiently overcome challenging tasks with such sparse and delayed rewards. We evaluated and analyzed EMR on several tasks in the Deep Mind Control Suite benchmark, experimental results show that the EMR-equipped agent has faster learning efficiency and even better performance than those using the exploration bonus or the reward redistribution method alone.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] A stable data-augmented reinforcement learning method with ensemble exploration and exploitation
    Guoyu Zuo
    Zhipeng Tian
    Gao Huang
    Applied Intelligence, 2023, 53 : 24792 - 24803
  • [32] Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study
    Lovatto, Angelo Gregorio
    de Barros, Leliane Nunes
    Maua, Denis D.
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 30 - 44
  • [33] Balancing exploration and exploitation in multiobjective evolutionary optimization
    Zhang, Hu
    Sun, Jianyong
    Liu, Tonglin
    Zhang, Ke
    Zhang, Qingfu
    INFORMATION SCIENCES, 2019, 497 : 129 - 148
  • [34] Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task
    Mohan Yogeswaran
    S. G. Ponnambalam
    OPSEARCH, 2012, 49 (3) : 223 - 236
  • [35] A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation
    Li, Han
    Chen, Tianding
    Teng, Hualiang
    Jiang, Yingtao
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2019, 118 (02): : 253 - +
  • [36] Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation
    Yao, Yao
    Xiao, Li
    An, Zhicheng
    Zhang, Wanpeng
    Luo, Dijun
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4202 - 4208
  • [37] Balancing Exploration and Exploitation in Supply Chain Portfolios
    Chiu, Yi-Chia
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2014, 61 (01) : 18 - 27
  • [38] Strategies for balancing exploration and exploitation in electromagnetic optimisation
    Xiao, Song
    Rotaru, Mihai
    Sykulski, Jan K.
    COMPEL-THE INTERNATIONAL JOURNAL FOR COMPUTATION AND MATHEMATICS IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2013, 32 (04) : 1176 - 1188
  • [39] Individual exploration and selective social learning: balancing exploration-exploitation trade-offs in collective foraging
    Garg, Ketika
    Kello, Christopher T.
    Smaldino, Paul E.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2022, 19 (189)
  • [40] Exploration and Exploitation in Online Learning
    Auer, Peter
    ADAPTIVE AND INTELLIGENT SYSTEMS, 2011, 6943 : 2 - 2