Structural Parameter Space Exploration for Reinforcement Learning via a Matrix Variate Distribution

被引:4
作者
Wang, Shaochen [1 ]
Yang, Rui [2 ]
Li, Bin [2 ]
Kan, Zhen [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China
[2] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230052, Peoples R China
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2023年 / 7卷 / 04期
基金
中国国家自然科学基金;
关键词
Uncertainty; Neural networks; Covariance matrices; Space exploration; Noise measurement; Matrix decomposition; Symmetric matrices; Reinforcement learning; parameter space exploration; weight uncertainty;
D O I
10.1109/TETCI.2022.3140380
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The trade-off between exploration and exploitation is essential for reinforcement learning, where an agent needs to be aware of when to explore for high reward policies and when to exploit the optimal policy known so far. Parameter space exploration provides an elegant solution. As one of the principal methods, injecting noise into the model parameters greatly improves exploration. However, directly stretching the parameters of the neural network into a vector and generating noise for this vector ignore the structural information of the model. In this paper, we aim to incorporate spatial information into weight matrices and propose matrix-variate noise exploration, which exploits the structural weight uncertainty brought by matrix variate noise to enhance the stochasticity of the agent. Indeed, we construct a bridge between the matrix noise exploration and probabilistic neural networks, which theoretically explains the improved performance of parameter space exploration. Extensive experiments have shown that matrix variate noise exploration outperforms fully factorized noisy exploration on most Atari tasks and Super Mario Bros tasks and is competitive to the state-of-the-art methods.
引用
收藏
页码:1025 / 1035
页数:11
相关论文
共 36 条
  • [1] Achiam J., 2017, ARXIV170301732
  • [2] Learning dexterous in-hand manipulation
    Andrychowicz, Marcin
    Baker, Bowen
    Chociej, Maciek
    Jozefowicz, Rafal
    McGrew, Bob
    Pachocki, Jakub
    Petron, Arthur
    Plappert, Matthias
    Powell, Glenn
    Ray, Alex
    Schneider, Jonas
    Sidor, Szymon
    Tobin, Josh
    Welinder, Peter
    Weng, Lilian
    Zaremba, Wojciech
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) : 3 - 20
  • [3] Bellemare MG, 2016, ADV NEUR IN, V29
  • [4] Bellemare MG, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P4148
  • [5] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
  • [6] Brockman G., 2016, ARXIV PREPRINT ARXIV
  • [7] Burda Y., 2018, P INT C LEARN REPR
  • [8] Chen R. Y., 2017, COMPUT RES REPOSITOR
  • [9] Fortunato M., 2018, PROC 6 INT C LEARN R
  • [10] Garivier A., 2011, P C LEARNING THEORY