Incremental Approximate Dynamic Programming for Nonlinear Adaptive Tracking Control with Partial Observability

被引:32
作者
Zhou, Ye [1 ,2 ]
van Kampen, Erik-Jan [1 ]
Chu, QiPing [1 ]
机构
[1] Delft Univ Technol, Control & Operat Div, Aerosp Engn, NL-2629 HS Delft, South Holland, Netherlands
[2] Univ Sains Malaysia, Sch Aerosp Engn, George Town, Malaysia
关键词
MODEL-PREDICTIVE CONTROL; OUTPUT-FEEDBACK CONTROL; FLIGHT CONTROL; SPACECRAFT; INVERSION; STATE;
D O I
10.2514/1.G003472
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
Approximate dynamic programming is a class of reinforcement learning, which solves adaptive, optimal control problems and tackles the curse of dimensionality with function approximators. Within this category, linear approximate dynamic programming provides a model-free control method by systematically using a quadratic cost-to-go function. Although efficient, linear approximate dynamic programming methods are difficult to apply to nonlinear systems or time-varying systems. To overcome the above limitations, this paper proposes an adaptive nonlinear tracking control method based on incremental approximate dynamic programming, which combines the advantages of linear approximate dynamic programming and incremental nonlinear control techniques. This is a model-free method for unknown, nonlinear systems and time-varying references. The trait of separating the local model information from the cost function approximation makes this method an option for partially observable control problems. This paper, therefore, proposes two reference tracking controllers for different observability conditions: the direct measurement of the full state, and the partially observable tracking error. In each condition, two algorithms are developed for off-line learning and online learning, respectively. These algorithms are applied to attitude control of a spacecraft disturbed by internal liquid sloshing. The results demonstrate that the proposed algorithms accurately deal with the unknown, time-varying internal dynamics while retaining efficient control, even with only partial observability.
引用
收藏
页码:2554 / 2567
页数:14
相关论文
共 54 条
[1]   Convex programming approach to powered descent guidance for Mars landing [J].
Acikmese, Behcet ;
Ploen, Scott R. .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2007, 30 (05) :1353-1366
[2]  
Acquatella P., 2013, P EUROGNC 2013
[3]  
Acquatella P., 2012, AIAA GUIDANCE NAVIGA
[4]  
[Anonymous], 2002, ADAPTIVE FILTER THEO
[5]  
[Anonymous], 2014, ADV SOLAR SAILING, DOI DOI 10.1007/978-3-642-34907-215
[6]  
[Anonymous], 2020, Reinforcement Learning, An Introduction
[7]  
[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
[8]  
Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[9]   Parametric POMDPs for planning in continuous state spaces [J].
Brooks, Alex ;
Makarenko, Alexei ;
Williams, Stefan ;
Durrant-Whyte, Hugh .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2006, 54 (11) :887-897
[10]   Noncooperative Rendezvous Using Angles-Only Optical Navigation: System Design and Flight Results [J].
D'Amico, S. ;
Ardaens, J. -S. ;
Gaias, G. ;
Benninghoff, H. ;
Schlepp, B. ;
Jorgensen, J. L. .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2013, 36 (06) :1576-1595