A novel model-based reinforcement learning algorithm for solving the problem of unbalanced reward

被引:2
作者
Yuan, Yinlong [1 ]
Hua, Liang [1 ]
Cheng, Yun [1 ]
Li, Junhong [1 ]
Sang, Xiaohu [1 ]
Zhang, Lei [1 ]
Wei, Wu [2 ]
机构
[1] Nantong Univ, Dept Coll Elect Engn, Nantong, Peoples R China
[2] South China Univ Technol, Dept Coll Automat Sci & Engn, 381 Wushan Rd, Guangzhou 510641, Peoples R China
关键词
Reinforcement learning; Model-based learning; Unbalanced reward; Multi-step methods; NEURAL-NETWORKS;
D O I
10.3233/JIFS-210956
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reward signal reinforcement learning algorithms can be used to solve sequential learning problems. However, in practice, they still suffer from the problem of reward imbalance, which limits their use in many contexts. To solve this unbalanced reward problem, in this paper, we propose a novel model-based reinforcement learning algorithm called the expected n-step value iteration (EnVI). Unlike traditional model-based reinforcement learning algorithms, the proposed method uses a new return function that changes the discount of future rewards while reducing the influence of the current reward. We evaluated the performance of the proposed algorithm on a Treasure-Hunting game and a Hill-Walking game. The results demonstrate that the proposed algorithm can reduce the negative impact of unbalanced rewards and greatly improve the performance of traditional reinforcement learning algorithms.
引用
收藏
页码:3233 / 3243
页数:11
相关论文
共 21 条
[11]  
Orseau Laurent., 2016, C UNC ART INT, P557
[12]   Deep learning in neural networks: An overview [J].
Schmidhuber, Juergen .
NEURAL NETWORKS, 2015, 61 :85-117
[13]   Mastering the game of Go with deep neural networks and tree search [J].
Silver, David ;
Huang, Aja ;
Maddison, Chris J. ;
Guez, Arthur ;
Sifre, Laurent ;
van den Driessche, George ;
Schrittwieser, Julian ;
Antonoglou, Ioannis ;
Panneershelvam, Veda ;
Lanctot, Marc ;
Dieleman, Sander ;
Grewe, Dominik ;
Nham, John ;
Kalchbrenner, Nal ;
Sutskever, Ilya ;
Lillicrap, Timothy ;
Leach, Madeleine ;
Kavukcuoglu, Koray ;
Graepel, Thore ;
Hassabis, Demis .
NATURE, 2016, 529 (7587) :484-+
[14]   An Adaptive Asynchronous Wake-Up Scheme for Underwater Acoustic Sensor Networks Using Deep Reinforcement Learning [J].
Su, Ruoyu ;
Gong, Zijun ;
Zhang, Dengyin ;
Li, Cheng ;
Chen, Yuanzhu ;
Venkatesan, Ramachandran .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (02) :1851-1865
[15]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[16]  
Sutton RS, 1996, ADV NEUR IN, V8, P1038
[17]  
van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094
[18]  
van Seijen H, 2009, ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, P177
[19]   Modular deep reinforcement learning from reward and punishment for robot navigation [J].
Wang, Jiexin ;
Elfwing, Stefan ;
Uchibe, Eiji .
NEURAL NETWORKS, 2021, 135 :115-126
[20]   A novel multi-step reinforcement learning method for solving reward hacking [J].
Yuan, Yinlong ;
Yu, Zhu Liang ;
Gu, Zhenghui ;
Deng, Xiaoyan ;
Li, Yuanqing .
APPLIED INTELLIGENCE, 2019, 49 (08) :2874-2888