Entropy regularization methods for parameter space exploration

被引:5
作者
Han, Shuai [1 ,2 ,3 ]
Zhou, Wenbo [1 ,4 ,5 ]
Lu, Shuai [1 ,2 ,6 ]
Zhu, Sheng [1 ,6 ]
Gong, Xiaoyu [1 ,2 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3584 CC Utrecht, Netherlands
[4] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[5] Northeast Normal Univ, Minist Educ, Key Lab Appl Stat, Changchun 130024, Peoples R China
[6] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Reinforcement learning; Entropy regularization; Exploration; Parameter spaces; Deterministic policy gradients;
D O I
10.1016/j.ins.2022.11.099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entropy regularization is an important approach to improve exploration and enhance pol-icy stability for reinforcement learning. However, in previous study, entropy regularization is applied only to action spaces. In this paper, we apply entropy regularization to parameter spaces. We use learnable noisy layers to parameterize the policy network to obtain a learn-able entropy. Also, we derive the expression for the entropy of the noisy parameter and an upper bound on the joint entropy. Based on these, we propose a model-free method named deep pseudo deterministic policy gradients based on entropy regularization (DPGER). This method maximizes the entropy of each noisy parameter in the early learning process to promote exploration, and minimizes the joint entropy of the noisy parameters in the later learning process to facilitate the formation of stable policies. We test our method on four Mujoco environments with five random seeds. The results show that our method brings better performance compared to previous methods. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [31] On the Prolonged Exploration of Distance Based Parameter Adaptation in SHADE
    Viktorin, Adam
    Senkerik, Roman
    Pluhacek, Michal
    Kadavy, Tomas
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 561 - 571
  • [32] Variational Bayesian Parameter-Based Policy Exploration
    Hosino, Tikara
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [33] Feudal Latent Space Exploration for Coordinated Multi-Agent Reinforcement Learning
    Liu, Xiangyu
    Tan, Ying
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7775 - 7783
  • [34] Gaussian kernel fuzzy c-means with width parameter computation and regularization
    Simoes, Eduardo C.
    de Carvalho, Francisco de A. T.
    PATTERN RECOGNITION, 2023, 143
  • [35] Fostering links between environmental and space exploration: the Earth and Space Foundation
    Cockell, CS
    White, D
    Messier, D
    Stokes, MD
    SPACE POLICY, 2002, 18 (04) : 301 - 306
  • [36] A review of geophysical methods for geothermal exploration
    Kana, Janvier Domra
    Djongyang, Noel
    Raidandi, Danwe
    Nouck, Philippe Njandjock
    Dadje, Abdouramani
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2015, 44 : 87 - 95
  • [37] Reduction of Genetic Drift in Population-Based Incremental Learning via Entropy Regularization
    Hamano, Ryoki
    Shirakawa, Shinichi
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 2022, : 491 - 494
  • [38] Analysis Classes for Space Exploration Reliability Assessments
    Franzini, Benjamin J.
    Putney, Blake F.
    ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2009 PROCEEDINGS, 2009, : 188 - +
  • [39] Space Exploration: Current Thinking on the Notion of Otherness
    Arnould, Jacques
    THEOLOGY AND SCIENCE, 2018, 16 (01) : 54 - 61
  • [40] Typed feature structures and design space exploration
    Woodbury, R
    Burrow, A
    Datta, S
    Chang, TW
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 1999, 13 (04): : 287 - 302