Generalized Maximum Entropy Reinforcement Learning via Reward Shaping

被引:2
作者
Tao F. [1 ]
Wu M. [2 ]
Cao Y. [2 ]
机构
[1] Volvo Car Technology Usa Llc, Sunnyvale, 94085, CA
[2] University of Texas, Department of Electrical Engineering, San Antonio, 78249, TX
来源
IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 04期
关键词
Entropy; reinforcement learning (RL); reward-shaping;
D O I
10.1109/TAI.2023.3297988
中图分类号
学科分类号
摘要
Entropy regularization is a commonly used technique in reinforcement learning to improve exploration and cultivate a better pre-trained policy for later adaptation. Recent studies further show that the use of entropy regularization can smooth the optimization landscape and simplify the policy optimization process, indicating the value of integrating entropy into reinforcement learning. However, existing studies only consider the policy's entropy at the current state as an extra regularization term in the policy gradient or in the objective function without formally integrating the entropy in the reward function. In this article, we propose a shaped reward that includes the agent's policy entropy into the reward function. In particular, the agent's expected entropy over a distribution of the next state is added to the immediate reward associated with the current state. The addition of the agent's expected policy entropy at the next state distribution is shown to yield new soft Q-function and state function that are concise and modular. Moreover, the new reinforcement learning framework can be easily applied to the existing standard reinforcement learning algorithms, such as deep q-network (DQN) and proximal policy optimization (PPO), while inheriting the benefits of employing entropy regularization. We further present a soft stochastic policy gradient theorem based on the shaped reward and propose a new practical reinforcement learning algorithm. Finally, a few experimental studies are conducted in MuJoCo environment to demonstrate that our method can outperform an existing state-of-the-art off-policy maximum entropy reinforcement learning approach soft actor-critic by 5%-150% in terms of average return. © 2020 IEEE.
引用
收藏
页码:1563 / 1572
页数:9
相关论文
共 50 条
  • [31] The generalized maximum belief entropy model
    Li, Siran
    Cai, Rui
    SOFT COMPUTING, 2022, 26 (09) : 4187 - 4198
  • [32] Generalized Maximum Entropy for Supervised Classification
    Mazuelas, Santiago
    Shen, Yuan
    Perez, Aritz
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (04) : 2530 - 2550
  • [33] The generalized maximum belief entropy model
    Siran Li
    Rui Cai
    Soft Computing, 2022, 26 : 4187 - 4198
  • [34] Potential-based reward shaping using state-space segmentation for efficiency in reinforcement learning
    Bal, Melis Ilayda
    Aydin, Hueseyin
    Iyiguen, Cem
    Polat, Faruk
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 157 : 469 - 484
  • [35] A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm
    Zu, Wenqiang
    Yang, Hongyu
    Liu, Renyu
    Ji, Yulong
    SENSORS, 2021, 21 (16)
  • [36] Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces
    Bal, Melis Ilayda
    Iyigun, Cem
    Polat, Faruk
    Aydin, Huseyin
    ANNALS OF OPERATIONS RESEARCH, 2024, 335 (02) : 689 - 725
  • [37] Multi-Agent Meta-Reinforcement Learning with Coordination and Reward Shaping for Traffic Signal Control
    Du, Xin
    Wang, Jiahai
    Chen, Siyuan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 349 - 360
  • [38] Car-Following Behavior Modeling With Maximum Entropy Deep Inverse Reinforcement Learning
    Nan, Jiangfeng
    Deng, Weiwen
    Zhang, Ruzheng
    Zhao, Rui
    Wang, Ying
    Ding, Juan
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3998 - 4010
  • [39] Reward shaping via expectation maximization method
    Deng, Zelin
    Liu, Xing
    Dong, Yunlong
    NEUROCOMPUTING, 2024, 609
  • [40] Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks
    Aotani, Takumi
    Kobayashi, Taisuke
    Sugimoto, Kenji
    APPLIED INTELLIGENCE, 2021, 51 (07) : 4434 - 4452