Determining the optimal temperature parameter for Softmax function in reinforcement learning

被引:43
作者
He, Yu-Lin [1 ,2 ]
Zhang, Xiao-Liang [1 ,2 ]
Ao, Wei [1 ,2 ]
Huang, Joshua Zhexue [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Guangdong, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
基金
中国博士后科学基金;
关键词
Softmax function; Temperature parameter; Probability vector; Reinforcement learning; D-armed bandit problem;
D O I
10.1016/j.asoc.2018.05.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The temperature parameter plays an important role in the action selection based on Softmax function which is used to transform an original vector into a probability vector. An efficient method named Opti-Softmax to determine the optimal temperature parameter for Softmax function in reinforcement learning is developed in this paper. Firstly, a new evaluation function is designed to measure the effectiveness of temperature parameter by considering the information-loss of transformation and the diversity among probability vector elements. Secondly, an iterative updating rule is derived to determine the optimal temperature parameter by calculating the minimum of evaluation function. Finally, the experimental results on the synthetic data and D-armed bandit problems demonstrate the feasibility and effectiveness of Opti-Softmax method. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:80 / 85
页数:6
相关论文
共 21 条
[11]  
Kohno Y, 2012, JOINT INT CONF SOFT, P1166, DOI 10.1109/SCIS-ISIS.2012.6505216
[12]   Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems [J].
Koulouriotis, D. E. ;
Xanthopoulos, A. .
APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) :913-922
[13]   An inspection-based price rebate and effort contract model with incomplete information [J].
Lan, Yanfei ;
Zhao, Ruiqing ;
Tang, Wansheng .
COMPUTERS & INDUSTRIAL ENGINEERING, 2015, 83 :264-272
[14]   Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction [J].
Masci, Jonathan ;
Meier, Ueli ;
Ciresan, Dan ;
Schmidhuber, Juergen .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 :52-59
[15]   Active object recognition by view integration and reinforcement learning [J].
Paletta, L ;
Pinz, A .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2000, 31 (1-2) :71-86
[16]  
Precup Doina, 2014, ARXIV PREPRINT ARXIV
[17]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[18]  
Sykulski A. M., 2010, 2010 Ninth International Conference on Machine Learning and Applications (ICMLA 2010), P459, DOI 10.1109/ICMLA.2010.74
[19]  
Tokic M, 2011, LECT NOTES ARTIF INT, V7006, P335, DOI 10.1007/978-3-642-24455-1_33
[20]   Joint representation and pattern learning for robust face recognition [J].
Yang, Meng ;
Zhu, Pengfei ;
Liu, Feng ;
Shen, Linlin .
NEUROCOMPUTING, 2015, 168 :70-80