Data-driven approximate value iteration with optimality error bound analysis

被引:14
作者
Li, Yongqiang [1 ]
Hou, Zhongsheng [2 ]
Feng, Yuanjing [1 ]
Chi, Ronghu [3 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou, Zhejiang, Peoples R China
[2] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Adv Control Syst Lab, Beijing, Peoples R China
[3] Qingdao Univ Sci & Technol, Sch Automat & Elect Engn, Qingdao, Peoples R China
基金
中国国家自然科学基金;
关键词
Data-driven control; Approximate dynamic programming; Domain of attraction; Asymptotic stabilization; TIME NONLINEAR-SYSTEMS; STABILITY; DESIGNS;
D O I
10.1016/j.automatica.2016.12.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Features of the data-driven approximate value iteration (AVI) algorithm, proposed in Li et al. (2014) for dealing with the optimal stabilization problem, include that only process data is required and that the estimate of the domain of attraction for the closed-loop is enlarged. However, the controller generated by the data-driven AVI algorithm is an approximate solution for the optimal control problem. In this work, a quantitative analysis result on the error bound between the optimal cost and the cost under the designed controller is given. This error bound is determined by the approximation error of the estimation for the optimal cost and the approximation error of the controller function estimator. The first one is concretely determined by the approximation error of the data-driven dynamic programming (DP) operator to the DP operator and the approximation error of the value function estimator. These three approximation errors are zeros when the data set of the plant is sufficient and infinitely complete, and the number of samples in the interested state space is infinite. This means that the cost under the designed controller equals to the optimal cost when the number of iterations is infinite. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:79 / 87
页数:9
相关论文
共 23 条
  • [1] Aha DW, 1997, ARTIF INTELL REV, V11, P7, DOI 10.1023/A:1006538427943
  • [2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
  • [3] Issues on stability of ADP feedback controllers for dynamical systems
    Balakrishnan, S. N.
    Ding, Jie
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 913 - 917
  • [4] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
  • [5] Bertsekas D.P., 2001, DYNAMIC PROGRAMMING, V2
  • [6] Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
    He, Pingan
    Jagannathan, S.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 425 - 436
  • [7] From model-based control to data-driven control: Survey, classification and perspective
    Hou, Zhong-Sheng
    Wang, Zhuo
    [J]. INFORMATION SCIENCES, 2013, 235 : 3 - 35
  • [8] Data-Driven Model-Free Adaptive Control for a Class of MIMO Nonlinear Discrete-Time Systems
    Hou, Zhongsheng
    Jin, Shangtai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (12): : 2173 - 2188
  • [9] A Novel Data-Driven Control Approach for a Class of Discrete-Time Nonlinear Systems
    Hou, Zhongsheng
    Jin, Shangtai
    [J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2011, 19 (06) : 1549 - 1558
  • [10] Robust adaptive dynamic programming for linear and nonlinear systems: An overview
    Jiang, Zhong-Ping
    Jiang, Yu
    [J]. EUROPEAN JOURNAL OF CONTROL, 2013, 19 (05) : 417 - 425