Near-Optimal Controller for Nonlinear Continuous-Time Systems With Unknown Dynamics Using Policy Iteration

被引:21
作者
Dutta, Samrat [1 ]
Patchaikani, Prem Kumar [2 ]
Behera, Laxmidhar [1 ]
机构
[1] IIT Kanpur, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
[2] GE India Technol Ctr, Bengaluru 560068, India
关键词
Fuzzy Lyapunov function; nonlinear systems; policy iteration (PI); single-network adaptive critic (SNAC); system identification; MARKOV DECISION-PROCESSES; TRACKING CONTROL; ALGORITHM; DESIGN;
D O I
10.1109/TNNLS.2015.2451535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a single-network adaptive critic-based controller for continuous-time systems with unknown dynamics in a policy iteration (PI) framework. It is assumed that the unknown dynamics can be estimated using the Takagi-Sugeno-Kang fuzzy model with arbitrary precision. The successful implementation of a PI scheme depends on the effective learning of critic network parameters. Network parameters must stabilize the system in each iteration in addition to approximating the critic and the cost. It is found that the critic updates according to the Hamilton-Jacobi-Bellman formulation sometimes lead to the instability of the closed-loop systems. In the proposed work, a novel critic network parameter update scheme is adopted, which not only approximates the critic at current iteration but also provides feasible solutions that keep the policy stable in the next step of training by combining a Lyapunov-based linear matrix inequalities approach with PI. The critic modeling technique presented here is the first of its kind to address this issue. Though multiple literature exists discussing the convergence of PI, however, to the best of our knowledge, there exists no literature, which focuses on the effect of critic network parameters on the convergence. Computational complexity in the proposed algorithm is reduced to the order of (F-z)(n-1), where n is the fuzzy state dimensionality and F-z is the number of fuzzy zones in the states space. A genetic algorithm toolbox of MATLAB is used for searching stable parameters while minimizing the training error. The proposed algorithm also provides a way to solve for the initial stable control policy in the PI scheme. The algorithm is validated through real-time experiment on a commercial robotic manipulator. Results show that the algorithm successfully finds stable critic network parameters in real time for a highly nonlinear system.
引用
收藏
页码:1537 / 1549
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2007, P 13 AM C INF SYST A
[2]   Issues on stability of ADP feedback controllers for dynamical systems [J].
Balakrishnan, S. N. ;
Ding, Jie ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :913-917
[3]   A simultaneous perturbation Stochastic approximation-based actor-critic algorithm for Markov decision processes [J].
Bhatnagar, S ;
Kumar, S .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (04) :592-598
[4]   Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle [J].
Cao, Xi-Ren ;
Wang, De-Xin ;
Qiu, Li .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (04) :921-936
[5]   Evolutionary policy iteration for solving Markov decision processes [J].
Chang, HS ;
Lee, HG ;
Fu, MC ;
Marcus, SI .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) :1804-1808
[6]   Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].
Dierks, Travis ;
Jagannathan, Sarangapani .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129
[7]   Single Network Approximate Dynamic Programming based Constrained Optimal Controller for Nonlinear Systems with Uncertainties [J].
Ding, Jie ;
Balakrishnan, S. N. .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3054-3059
[8]  
Dutta S., 2013, P IEEE MULT SYST CON, P358
[9]  
Dutta S, 2014, IEEE INT FUZZY SYST, P98, DOI 10.1109/FUZZ-IEEE.2014.6891793
[10]   About the use of fuzzy clustering techniques for fuzzy model identification [J].
Gómez-Skarmeta, AF ;
Delgado, M ;
Vila, MA .
FUZZY SETS AND SYSTEMS, 1999, 106 (02) :179-188