Balancing exploration and exploitation in episodic reinforcement learning

被引：2

作者：

Chen, Qihang ^{[1
]}

Zhang, Qiwei ^{[1
]}

Liu, Yunlong ^{[1
]}

机构：

[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 231卷

关键词：

Reinforcement learning; Episodic tasks; Sparse and delayed rewards; Exploration and exploitation; Entropy-based intrinsic incentives;

D O I：

10.1016/j.eswa.2023.120801

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the major challenges in reinforcement learning (RL) is its applications in episodic tasks, such as chess game, molecular structure design, healthcare, among others, where the rewards in such scenarios are usually sparse and can only be obtained at the end of an episode. The challenges posed by such episodic RL tasks place stringent demands on the exploration and credit assignment capabilities of the agent. In the current literature, many techniques have been presented to address these two issues, for example, various exploration methods have been proposed to increase the exploration ability of the agents to obtain diverse experience samples, and for the delayed reward problem, reward redistribution methods have provided dense task-oriented guidance to the agents by reshaping the sparse and delayed environmental rewards with the assistance of the episodic feedback. Although some successes have been achieved, with current existing techniques, the agents are usually unable to quickly assign credits to the explored key transitions or the related methods are prone to be misled by behavioral policies that fall into local optima and lead to sluggish learning efficiency. To alleviate inefficient learning due to sparse and delayed rewards, we propose a guided reward approach, namely Exploratory Intrinsic with Mission Guidance Reward (EMR), which organically combines intrinsic rewards of exploration mechanisms with reward redistribution in RL to balance exploration and exploitation of RL agents in such tasks. By using entropy-based intrinsic incentives and a simple uniform reward redistribution method, EMR will enable an agent with both the strong exploration and exploitation capability to efficiently overcome challenging tasks with such sparse and delayed rewards. We evaluated and analyzed EMR on several tasks in the Deep Mind Control Suite benchmark, experimental results show that the EMR-equipped agent has faster learning efficiency and even better performance than those using the exploration bonus or the reward redistribution method alone.

引用

页数：8

共 50 条

[41] Almost surely safe exploration and exploitation for deep reinforcement learning with state safety estimation
Lin, Ke
Li, Yanjie
Liu, Qi
Li, Duantengchuan
Shi, Xiongtao
Chen, Shiyu
INFORMATION SCIENCES, 2024, 662
[42] PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning
Matheron, Guillaume
Perrin, Nicolas
Sigaud, Olivier
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 295 - 307
[43] Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task
Yogeswaran, Mohan
Ponnambalam, S.
OPSEARCH, 2012, 49 (03) : 223 - 236
[44] Latent Landmark Graph for Efficient Exploration-exploitation Balance in Hierarchical Reinforcement Learning
Zhang, Qingyang
Zhang, Hongming
Xing, Dengpeng
Xu, Bo
MACHINE INTELLIGENCE RESEARCH, 2025, : 267 - 288
[45] World Hyper-Heuristic: A novel reinforcement learning approach for dynamic exploration and exploitation
Daliri, Arman
Alimoradi, Mahmoud
Zabihimayvan, Mahdieh
Sadeghi, Reza
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
[46] Adaptive network approach to exploration-exploitation trade-off in reinforcement learning
Moradi, Mohammadamin
Zhai, Zheng-Meng
Panahi, Shirin
Lai, Ying-Cheng
CHAOS, 2024, 34 (12)
[47] EXPLORATION AND EXPLOITATION IN ORGANIZATIONAL LEARNING
March, James G.
ORGANIZATION SCIENCE, 1991, 2 (01) : 71 - 87
[48] Balancing exploration and exploitation: The moderating role of competitive intensity
Auh, S
Menguc, B
JOURNAL OF BUSINESS RESEARCH, 2005, 58 (12) : 1652 - 1661
[49] Balancing the exploration and exploitation capabilities of the Differential Evolution Algorithm
Epitropakis, M. G.
Plagianakos, V. P.
Vrahatis, M. N.
2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 2686 - 2693
[50] An Adaptive Model Management Strategy: Balancing Exploration and Exploitation
Hu, Caie
Zeng, Sanyou
Li, Changhe
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,

← 1 2 3 4 5 →