Balancing exploration and exploitation in episodic reinforcement learning

被引:2
|
作者
Chen, Qihang [1 ]
Zhang, Qiwei [1 ]
Liu, Yunlong [1 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
关键词
Reinforcement learning; Episodic tasks; Sparse and delayed rewards; Exploration and exploitation; Entropy-based intrinsic incentives;
D O I
10.1016/j.eswa.2023.120801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major challenges in reinforcement learning (RL) is its applications in episodic tasks, such as chess game, molecular structure design, healthcare, among others, where the rewards in such scenarios are usually sparse and can only be obtained at the end of an episode. The challenges posed by such episodic RL tasks place stringent demands on the exploration and credit assignment capabilities of the agent. In the current literature, many techniques have been presented to address these two issues, for example, various exploration methods have been proposed to increase the exploration ability of the agents to obtain diverse experience samples, and for the delayed reward problem, reward redistribution methods have provided dense task-oriented guidance to the agents by reshaping the sparse and delayed environmental rewards with the assistance of the episodic feedback. Although some successes have been achieved, with current existing techniques, the agents are usually unable to quickly assign credits to the explored key transitions or the related methods are prone to be misled by behavioral policies that fall into local optima and lead to sluggish learning efficiency. To alleviate inefficient learning due to sparse and delayed rewards, we propose a guided reward approach, namely Exploratory Intrinsic with Mission Guidance Reward (EMR), which organically combines intrinsic rewards of exploration mechanisms with reward redistribution in RL to balance exploration and exploitation of RL agents in such tasks. By using entropy-based intrinsic incentives and a simple uniform reward redistribution method, EMR will enable an agent with both the strong exploration and exploitation capability to efficiently overcome challenging tasks with such sparse and delayed rewards. We evaluated and analyzed EMR on several tasks in the Deep Mind Control Suite benchmark, experimental results show that the EMR-equipped agent has faster learning efficiency and even better performance than those using the exploration bonus or the reward redistribution method alone.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Balancing Exploration and Exploitation Ratio in Reinforcement Learning
    Ozcan, Ozkan
    de Moraes, Claudio Coreixas
    Alt, Jonathan
    MILITARY MODELING & SIMULATION SYMPOSIUM 2011 (MMS 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 7 OF 8, 2011, : 126 - 131
  • [2] BALANCING EXPLORATION AND EXPLOITATION IN REINFORCEMENT LEARNING USING A VALUE OF INFORMATION CRITERION
    Sledge, Isaac J.
    Principe, Jose C.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2816 - 2820
  • [3] Dealing with uncertainty: Balancing exploration and exploitation in deep recurrent reinforcement learning
    Zangirolami, Valentina
    Borrotti, Matteo
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [4] Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
    Zhang, Qingyang
    Yang, Yiming
    Ruan, Jingqing
    Xiong, Xuantang
    Xing, Dengpeng
    Xu, Bo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Balancing Exploration and Exploitation in Learning to Rank Online
    Hofmann, Katja
    Whiteson, Shimon
    de Rijke, Maarten
    ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 251 - 263
  • [6] Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning
    Gupta, Piyush
    Srivastava, Vaibhav
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2313 - 2318
  • [7] Balancing Exploration and Exploitation in Self-imitation Learning
    Kang, Chun-Yao
    Chen, Ming-Syan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 274 - 285
  • [8] Corruption-Robust Exploration in Episodic Reinforcement Learning
    Lykouris, Thodoris
    Simchowitz, Max
    Slivkins, Aleksandrs
    Sun, Wen
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [9] Exploration and exploitation balance management in fuzzy reinforcement learning
    Derhami, Vali
    Majd, Vahid Johari
    Ahmadabadi, Majid Nili
    FUZZY SETS AND SYSTEMS, 2010, 161 (04) : 578 - 595
  • [10] Balancing exploration and exploitation in memetic algorithms: A learning automata approach
    Mirsaleh, Mehdi Rezapoor
    Meybodi, Mohammad Reza
    COMPUTATIONAL INTELLIGENCE, 2018, 34 (01) : 282 - 309