Entropy regularization methods for parameter space exploration

被引:5
作者
Han, Shuai [1 ,2 ,3 ]
Zhou, Wenbo [1 ,4 ,5 ]
Lu, Shuai [1 ,2 ,6 ]
Zhu, Sheng [1 ,6 ]
Gong, Xiaoyu [1 ,2 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands
[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China
[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;
D O I
10.1016/j.ins.2022.11.099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [21] ADAPTIVE ENTROPY REGULARIZATION FOR UNSUPERVISED DOMAIN ADAPTATION IN MEDICAL IMAGE SEGMENTATION
    Shi, Andrew
    Feng, Wei
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [22] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [23] Fuzzy C-Means with Non-Extensive Entropy Regularization
    Susan, Seba
    Sharawat, Puneet
    Singh, Sandeep
    Meena, Ramkesh
    Verma, Amit
    Kumar, Mukesh
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [24] Melange: Space Folding for Visual Exploration
    Elmqvist, Niklas
    Riche, Yann
    Henry-Riche, Nathalie
    Fekete, Jean-Daniel
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2010, 16 (03) : 468 - 483
  • [25] Petroleum exploration in Africa from space
    Gianinetto, Marco
    Frassy, Federico
    Aiello, Martina
    Nodari, Francesco Rota
    EARTH RESOURCES AND ENVIRONMENTAL REMOTE SENSING/GIS APPLICATIONS VIII, 2017, 10428
  • [26] Space Exploration and the Greenland Norse; A Comparative Study on the Application of Technology for Exploration
    Swanson, Theodore D.
    SPACE, PROPULSION & ENERGY SCIENCES INTERNATIONAL FORUM SPESIF-2009, 2009, 1103 : 407 - 413
  • [27] Outer Space Exploration as a Sociological Problem
    Khodykin, Alexander
    SOCIOLOGICESKOE OBOZRENIE, 2019, 18 (04): : 47 - 73
  • [28] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy
    Uchibe, Eiji
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10922 - 10929
  • [29] Fast anchor graph optimized projections with principal component analysis and entropy regularization
    Wang, Jikui
    Zhang, Cuihong
    Zhao, Wei
    Huang, Xueyan
    Nie, Feiping
    INFORMATION SCIENCES, 2025, 699
  • [30] Entropy-Driven Parameter Control for Evolutionary Algorithms
    Liu, Shih-Hsi
    Mernik, Marjan
    Bryant, Barrett R.
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2007, 31 (01): : 41 - 50