A novel model-based reinforcement learning algorithm for solving the problem of unbalanced reward

被引：2

作者：

Yuan, Yinlong ^{[1
]}

Hua, Liang ^{[1
]}

Cheng, Yun ^{[1
]}

Li, Junhong ^{[1
]}

Sang, Xiaohu ^{[1
]}

Zhang, Lei ^{[1
]}

Wei, Wu ^{[2
]}

机构：

[1] Nantong Univ, Dept Coll Elect Engn, Nantong, Peoples R China

[2] South China Univ Technol, Dept Coll Automat Sci & Engn, 381 Wushan Rd, Guangzhou 510641, Peoples R China

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2023年 / 44卷 / 02期

关键词：

Reinforcement learning; Model-based learning; Unbalanced reward; Multi-step methods; NEURAL-NETWORKS;

D O I：

10.3233/JIFS-210956

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reward signal reinforcement learning algorithms can be used to solve sequential learning problems. However, in practice, they still suffer from the problem of reward imbalance, which limits their use in many contexts. To solve this unbalanced reward problem, in this paper, we propose a novel model-based reinforcement learning algorithm called the expected n-step value iteration (EnVI). Unlike traditional model-based reinforcement learning algorithms, the proposed method uses a new return function that changes the discount of future rewards while reducing the influence of the current reward. We evaluated the performance of the proposed algorithm on a Treasure-Hunting game and a Hill-Walking game. The results demonstrate that the proposed algorithm can reduce the negative impact of unbalanced rewards and greatly improve the performance of traditional reinforcement learning algorithms.

引用

页码：3233 / 3243

页数：11

共 21 条

[1]

Amodei D, 2016, Arxiv, DOI [arXiv:1606.06565, 10.48550/arXiv.1606.06565]

[2]

Aslund H., 2018, ARXIV

[3]

Hadfield-Menell D, 2017, ADV NEUR IN, V30

[4] Deep learning [J].

LeCun, Yann ;

Bengio, Yoshua ;

Hinton, Geoffrey .

NATURE, 2015, 521 (7553) :436-444

[5]

Leike J, 2017, Arxiv, DOI arXiv:1711.09883

[6] Markov Random Processes Are Neither Bandlimited nor Recoverable From Samples or After Quantization [J].

Marco, Daniel .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (02) :900-905

[7]

Mnih V, 2013, Arxiv, DOI [arXiv:1312.5602, 10.48550/arXiv.1312.5602]

[8]

Mnih V, 2016, PR MACH LEARN RES, V48

[9] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[10] A hierarchical self-attentive neural extractive summarizer via reinforcement learning (HSASRL) [J].

Mohsen, Farida ;

Wang, Jiayang ;

Al-Sabahi, Kamal .

APPLIED INTELLIGENCE, 2020, 50 (09) :2633-2646

← 1 2 3 →