MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

被引:63
作者
Zhao, Dongbin [1 ]
Zhu, Yuanheng [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Efficient exploration; probably approximately correct (PAC); reinforcement learning (RL); state aggregation; TIME NONLINEAR-SYSTEMS; MODEL-BASED EXPLORATION; ZERO-SUM GAMES; CONTROL SCHEME;
D O I
10.1109/TNNLS.2014.2371046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.
引用
收藏
页码:346 / 356
页数:11
相关论文
共 42 条
[11]  
Kakade Sham, 2003, ICML
[12]   Near-optimal reinforcement learning in polynomial time [J].
Kearns, M ;
Singh, S .
MACHINE LEARNING, 2002, 49 (2-3) :209-232
[13]   Optimal control for discrete-time affine non-linear systems using general value iteration [J].
Li, H. ;
Liu, D. .
IET CONTROL THEORY AND APPLICATIONS, 2012, 6 (18) :2725-2736
[14]   Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics [J].
Li, Hongliang ;
Liu, Derong ;
Wang, Ding .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (03) :706-714
[15]   Adaptive critic learning techniques for engine torque and air-fuel ratio control [J].
Liu, Derong ;
Javaherian, Hossein ;
Kovalenko, Olesia ;
Huang, Ting .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :988-993
[16]   Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems [J].
Liu, Derong ;
Wang, Ding ;
Wang, Fei-Yue ;
Li, Hongliang ;
Yang, Xiong .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) :2834-2847
[17]   Online Synchronous Approximate Optimal Learning Algorithm for Multiplayer Nonzero-Sum Games With Unknown Dynamics [J].
Liu, Derong ;
Li, Hongliang ;
Wang, Ding .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (08) :1015-1027
[18]   Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems [J].
Liu, Derong ;
Wei, Qinglai .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (03) :621-634
[19]   Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach [J].
Liu, Derong ;
Wang, Ding ;
Li, Hongliang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (02) :418-428
[20]   Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems [J].
Liu, Derong ;
Wei, Qinglai .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (02) :779-789