Entropy regularization methods for parameter space exploration

被引:5
作者
Han, Shuai [1 ,2 ,3 ]
Zhou, Wenbo [1 ,4 ,5 ]
Lu, Shuai [1 ,2 ,6 ]
Zhu, Sheng [1 ,6 ]
Gong, Xiaoyu [1 ,2 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands
[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China
[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;
D O I
10.1016/j.ins.2022.11.099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [1] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [2] Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
    Cen, Shicong
    Wei, Yuting
    Chi, Yuejie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 48
  • [3] Proximal Policy Optimization with Entropy Regularization
    Shen, Yuqing
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383
  • [4] Exploratory LQG mean field games with entropy regularization
    Firoozi, Dena
    Jaimungal, Sebastian
    AUTOMATICA, 2022, 139
  • [5] Structural Parameter Space Exploration for Reinforcement Learning via a Matrix Variate Distribution
    Wang, Shaochen
    Yang, Rui
    Li, Bin
    Kan, Zhen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1025 - 1035
  • [6] Comparison between method of moments and entropy regularization algorithm applied to parameter estimation for mixed-Weibull distribution
    Hung, Wen-Liang
    Chang, Yen-Chang
    JOURNAL OF APPLIED STATISTICS, 2011, 38 (12) : 2709 - 2722
  • [7] Robotic Navigation using Entropy-Based Exploration
    Usama, Muhammad
    Chang, Dong Eui
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 1322 - 1327
  • [8] Entropy regularization for weakly supervised object localization
    Hwang, Dongjun
    Ha, Jung-Woo
    Shim, Hyunjung
    Choe, Junsuk
    PATTERN RECOGNITION LETTERS, 2023, 169 : 1 - 7
  • [9] Entropy regularization for unsupervised clustering with adaptive neighbors
    Wang, Jingyu
    Ma, Zhenyu
    Nie, Feiping
    Li, Xuelong
    PATTERN RECOGNITION, 2022, 125
  • [10] Entropy Regularization for Mean Field Games with Learning
    Guo, Xin
    Xu, Renyuan
    Zariphopoulou, Thaleia
    MATHEMATICS OF OPERATIONS RESEARCH, 2022, : 3239 - 3260