Determining the optimal temperature parameter for Softmax function in reinforcement learning

被引:43
作者
He, Yu-Lin [1 ,2 ]
Zhang, Xiao-Liang [1 ,2 ]
Ao, Wei [1 ,2 ]
Huang, Joshua Zhexue [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Guangdong, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
基金
中国博士后科学基金;
关键词
Softmax function; Temperature parameter; Probability vector; Reinforcement learning; D-armed bandit problem;
D O I
10.1016/j.asoc.2018.05.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The temperature parameter plays an important role in the action selection based on Softmax function which is used to transform an original vector into a probability vector. An efficient method named Opti-Softmax to determine the optimal temperature parameter for Softmax function in reinforcement learning is developed in this paper. Firstly, a new evaluation function is designed to measure the effectiveness of temperature parameter by considering the information-loss of transformation and the diversity among probability vector elements. Secondly, an iterative updating rule is derived to determine the optimal temperature parameter by calculating the minimum of evaluation function. Finally, the experimental results on the synthetic data and D-armed bandit problems demonstrate the feasibility and effectiveness of Opti-Softmax method. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:80 / 85
页数:6
相关论文
共 21 条
[1]   Reinforcement learning for True Adaptive traffic signal control [J].
Abdulhai, B ;
Pringle, R ;
Karakoulas, GJ .
JOURNAL OF TRANSPORTATION ENGINEERING, 2003, 129 (03) :278-285
[2]  
[Anonymous], 2009, P JOINT C 47 ANN M A, DOI DOI 10.3115/1687878.1687892
[3]   Fuzziness based semi-supervised learning approach for intrusion detection system [J].
Ashfaq, Rana Aamir Raza ;
Wang, Xi-Zhao ;
Huang, Joshua Zhexue ;
Abbas, Haider ;
He, Yu-Lin .
INFORMATION SCIENCES, 2017, 378 :484-497
[4]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[5]  
Bishop Christopher M, 2016, Pattern recognition and machine learning
[6]   MULTINOMIAL LOGISTIC-REGRESSION ALGORITHM [J].
BOHNING, D .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1992, 44 (01) :197-200
[7]   A review on neural networks with random weights [J].
Cao, Weipeng ;
Wang, Xizhao ;
Ming, Zhong ;
Gao, Jinzhu .
NEUROCOMPUTING, 2018, 275 :278-287
[8]  
Garivier A, 2011, LECT NOTES ARTIF INT, V6925, P174, DOI 10.1007/978-3-642-24412-4_16
[9]   Random weight network-based fuzzy nonlinear regression for trapezoidal fuzzy number data [J].
He, Yu-Lin ;
Wei, Cheng-Hao ;
Long, Hao ;
Ashfaq, Rana Aamir Raza ;
Huang, Joshua Zhexue .
APPLIED SOFT COMPUTING, 2018, 70 :959-979
[10]   Fuzzy nonlinear regression analysis using a random weight network [J].
He, Yu-Lin ;
Wang, Xi-Zhao ;
Huang, Joshua Zhexue .
INFORMATION SCIENCES, 2016, 364 :222-240