Structural Parameter Space Exploration for Reinforcement Learning via a Matrix Variate Distribution

被引：4

作者：

Wang, Shaochen ^{[1
]}

Yang, Rui ^{[2
]}

Li, Bin ^{[2
]}

Kan, Zhen ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China

[2] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230052, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2023年 / 7卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Uncertainty; Neural networks; Covariance matrices; Space exploration; Noise measurement; Matrix decomposition; Symmetric matrices; Reinforcement learning; parameter space exploration; weight uncertainty;

D O I：

10.1109/TETCI.2022.3140380

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The trade-off between exploration and exploitation is essential for reinforcement learning, where an agent needs to be aware of when to explore for high reward policies and when to exploit the optimal policy known so far. Parameter space exploration provides an elegant solution. As one of the principal methods, injecting noise into the model parameters greatly improves exploration. However, directly stretching the parameters of the neural network into a vector and generating noise for this vector ignore the structural information of the model. In this paper, we aim to incorporate spatial information into weight matrices and propose matrix-variate noise exploration, which exploits the structural weight uncertainty brought by matrix variate noise to enhance the stochasticity of the agent. Indeed, we construct a bridge between the matrix noise exploration and probabilistic neural networks, which theoretically explains the improved performance of parameter space exploration. Extensive experiments have shown that matrix variate noise exploration outperforms fully factorized noisy exploration on most Atari tasks and Super Mario Bros tasks and is competitive to the state-of-the-art methods.

引用

页码：1025 / 1035

页数：11

共 36 条

[1] Achiam J., 2017, ARXIV170301732
[2] Learning dexterous in-hand manipulation
Andrychowicz, Marcin
Baker, Bowen
Chociej, Maciek
Jozefowicz, Rafal
McGrew, Bob
Pachocki, Jakub
Petron, Arthur
Plappert, Matthias
Powell, Glenn
Ray, Alex
Schneider, Jonas
Sidor, Szymon
Tobin, Josh
Welinder, Peter
Weng, Lilian
Zaremba, Wojciech
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) : 3 - 20
[3] Bellemare MG, 2016, ADV NEUR IN, V29
[4] Bellemare MG, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P4148
[5] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
[6] Brockman G., 2016, ARXIV PREPRINT ARXIV
[7] Burda Y., 2018, P INT C LEARN REPR
[8] Chen R. Y., 2017, COMPUT RES REPOSITOR
[9] Fortunato M., 2018, PROC 6 INT C LEARN R
[10] Garivier A., 2011, P C LEARNING THEORY

← 1 2 3 4 →