VALUE FUNCTION ESTIMATION BASED ON AN ERROR GAUSSIAN MIXTURE MODEL

被引:0
作者
Cui, Delong [1 ]
Peng, Zhiping [1 ]
Li, Qirui [1 ]
He, Jieguang [1 ]
Li, Kaibin [1 ]
Hung, Shangchao [2 ,3 ]
机构
[1] Guangdong Univ Petrochem Technol, Coll Comp & Elect Informat, Maoming 525000, Guangdong, Peoples R China
[2] Fuzhou Univ, Fuzhou Polytech, Fuzhou 350108, Fujian, Peoples R China
[3] Intelligent Technol Res Ctr, Fuzhou 350108, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Value function estimation; error Gaussian mixture model; Gaussian process regression; reinforcement learning;
D O I
暂无
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In reinforcement, exploration and utilization of agents' action selection has always been the key problem. Agents should not only make full use of maximum action, but also explore potential optimal action. Inspired by the exploration and utilization of actions selection, a novel value function exploration algorithm based on an error Gaussian mixture model (EGMM) is proposed in this paper. First, appropriate variables are chosen from error data, and the number of Gaussian components are obtained by optimizing a Bayesian information criterion via the EGMM. Then, the EGMM is used for the fitting and calculation of error data to obtain the conditional error mean to compensate for the output, thus obtaining more accurate results. We test the performance of the designed algorithm via a virtual experimental platform in a cloud computing environment. Experiments demonstrate the proposed algorithm eliminate the influence of non-Gaussian noise on model prediction performance.
引用
收藏
页码:1687 / 1702
页数:16
相关论文
共 31 条
[1]  
Anschel Oron, 2017, PR MACH LEARN RES, V70, P176
[2]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[3]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[4]   Online Selective Kernel-Based Temporal Difference Learning [J].
Chen, Xingguo ;
Gao, Yang ;
Wang, Ruili .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (12) :1944-1956
[5]  
Chen Xue-song, 2013, Control and Decision, V28, P1889
[6]  
Cheng Yu Hu, 2008, Information and Control, V37, P1
[7]   A Reinforcement Learning-Based Mixed Job Scheduler Scheme for Grid or IaaS Cloud [J].
Cui, Delong ;
Peng, Zhiping ;
Xiong, Jianbin ;
Xu, Bo ;
Lin, Weiwei .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) :1030-1039
[8]  
D'Eramo C, 2016, PR MACH LEARN RES, V48
[9]  
Duan Y., 2011, MANUFACTURING AUTOMA, V33, P20
[10]  
Hasselt H., 2010, ADV NEURAL INFORM PR, V23, DOI DOI 10.5555/2997046.2997187