Rethinking Stochasticity in Neural Networks for Reinforcement Learning with Continuous Actions

被引:0
作者
Shah, Syed Naveed Hussain [1 ]
Hougen, Dean Frederick [2 ]
机构
[1] Microsoft Corp, MS Off, Redmond, WA 98052 USA
[2] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
来源
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019) | 2019年
关键词
reinforcement learning; neural networks; stochasticity; gradient descent; REINFORCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we reconsider the use of stochasticity in neural networks for reinforcement learning in continuous action spaces. We consider stochasticity from both a reinforcement learning perspective and a neural networks perspective, leading us to reconsider whether noise sampling for exploration should take place at the level of the synapse (S), unit (U), or network (N). To investigate this question, we introduce a superset of the venerable multiparameter REINFORCE algorithm that we call REINFORCE SUN because it allows for stochasticity at each of these levels, and compare these variants on multidimensional problem sets with either continuous or discrete states. Our results show that moving stochasticity from the unit level to the synapse level substantially improves performance across all instances considered. As placing stochasticity at the unit level is nearly ubiquitous within the discipline, our results suggest that this standard practice should be reexamined more broadly.
引用
收藏
页码:488 / 496
页数:9
相关论文
共 26 条
[1]  
[Anonymous], 2016, PROC INT C MACH LEAR
[2]  
[Anonymous], 2015, COMPUT SCI, DOI DOI 10.1016/S1098-3015(10)67722-4
[3]  
[Anonymous], 2014, ICML ICML 14
[4]  
[Anonymous], 2016, INT C MACH LEARN
[5]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[6]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[7]   A STOCHASTIC REINFORCEMENT LEARNING ALGORITHM FOR LEARNING REAL-VALUED FUNCTIONS [J].
GULLAPALLI, V .
NEURAL NETWORKS, 1990, 3 (06) :671-692
[8]  
Gullapalli V., 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics. `Decision Aiding for Complex Systems (Cat. No.91CH3067-6), P1453, DOI 10.1109/ICSMC.1991.169893
[9]  
Heess N., 2015, ARXIV151009142CS
[10]   Global reinforcement learning in neural networks [J].
Ma, Xiaolong ;
Likharev, Konstantin K. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (02) :573-577