Entropy regularization methods for parameter space exploration

被引:5
作者
Han, Shuai [1 ,2 ,3 ]
Zhou, Wenbo [1 ,4 ,5 ]
Lu, Shuai [1 ,2 ,6 ]
Zhu, Sheng [1 ,6 ]
Gong, Xiaoyu [1 ,2 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands
[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China
[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;
D O I
10.1016/j.ins.2022.11.099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [41] Recent Implementations of Autonomous Robotics for Space Exploration
    Mehfuz, Fahad
    2018 INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY, ELECTRONICS, AND COMPUTING SYSTEMS (SEEMS), 2018,
  • [42] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
    Liu, Xi
    Ma, Long
    Chen, Zhen
    Zheng, Changgang
    Chen, Ren
    Liao, Yong
    Yang, Shufan
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221
  • [43] FACSE: a framework for architecture and compilation space exploration
    Niar, Smail
    Inglart, Nicolas
    Chaker, Mahdi
    Hanafi, Said
    Benameur, Nasser
    2007 INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA, 2007, : 249 - +
  • [44] Effective Multi-Agent Deep Reinforcement Learning Control With Relative Entropy Regularization
    Miao, Chenyang
    Cui, Yunduan
    Li, Huiyun
    Wu, Xinyu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 3704 - 3718
  • [45] Visual Parameter Space Analysis: A Conceptual Framework
    Sedlmair, Michael
    Heinzl, Christoph
    Bruckner, Stefan
    Piringer, Harald
    Moeller, Torsten
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 2161 - 2170
  • [46] Methods of Multi-Modal Data Exploration
    Grosup, Tomas
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 34 - 37
  • [47] EXPLORATION OF DESIGN THINKING METHODS FOR DIGITAL PLATFORMS
    Fahrudin, Rifqi
    Asfi, Marsani
    Pranata, Sudadi
    Lukita, Chandra
    Soegoto, Eddy Soeryanto
    Sumitra, Irfan Dwiguna
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2025, 20 (03): : 765 - 779
  • [48] Reward Space Noise for Exploration in Deep Reinforcement Learning
    Sun, Chuxiong
    Wang, Rui
    Li, Qian
    Hu, Xiaohui
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
  • [49] Global partnerships: Expanding the frontiers of space exploration education
    MacLeish, Marlene Y.
    Akinyede, Joseph O.
    Goswami, Nandu
    Thomson, William A.
    ACTA ASTRONAUTICA, 2012, 80 : 190 - 196
  • [50] Approximate determination of q-parameter for FCM with tsallis entropy maximization
    Yasuda M.
    Journal of Advanced Computational Intelligence and Intelligent Informatics, 2017, 21 (07) : 1152 - 1160