Continuous action reinforcement learning for control-affine systems with unknown dynamics

被引：14

作者：

Faust, Aleksandra ^{[1
]}

Ruymgaart, Peter ^{[1
]}

Salman, Molly ^{[2
]}

Fierro, Rafael ^{[3
]}

Tapia, Lydia ^{[1
]}

机构：

[1] Department of Computer Science, University of New Mexico, Albuquerque, 87131, NM

[2] Computer Science Department, Austin College, Sherman, 75090, TX

[3] Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, 87131, NM

来源：

IEEE/CAA Journal of Automatica Sinica | 2014年 / 1卷 / 03期

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

approximate value iteration; continuous action spaces; control-affine nonlinear systems; fitted value iteration; policy approximation; Reinforcement learning;

D O I：

10.1109/JAS.2014.7004690

中图分类号：

学科分类号：

摘要：

Control of nonlinear systems is challenging in realtime. Decision making, performed many times per second, must ensure system safety. Designing input to perform a task often involves solving a nonlinear system of differential equations, which is a computationally intensive, if not intractable problem. This article proposes sampling-based task learning for control-affine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs. A quadratic negative definite state-value function implies the existence of a unique maximum of the action-value function at any state. This allows the replacement of the standard greedy policy with a computationally efficient policy approximation that guarantees progression to a goal state without knowledge of the system dynamics. The policy approximation is consistent, i.e., it does not depend on the action samples used to calculate it. This method is appropriate for mechanical systems with high-dimensional input spaces and unknown dynamics performing Constraint-Balancing Tasks. We verify it both in simulation and experimentally for an Unmanned Aerial Vehicles (UAVs) carrying a suspended load, and in simulation, for the rendezvous of heterogeneous robots. © 2014 Chinese Association of Automation.

引用

页码：323 / 336

页数：13

共 33 条

[1]

Levine J., Analysis and control of nonlinear systems: A flatness-based approach, Mathematical Engineering, (2009)

[2]

Khalil H.K., Nonlinear Systems, (1996)

[3]

Busoniu L., Babuska R., De Schutter B., Ernst D., Reinforcement Learning and Dynamic Programming Using Function Approximators, (2010)

[4]

Bertsekas D.P., Tsitsiklis J.N., Neuro-Dynamic Programming, (1996)

[5]

Ernst D., Glavic M., Geurts P., Wehenkel L., Approximate value iteration in the reinforcement learning context. Application to electrical power system control, International Journal of Emerging Electric Power Systems, 3, 1, pp. 10661-106637, (2005)

[6]

Taylor C.J., Cowley A., Parsing indoor scenes using RGB-D imagery, Proceeding of Robotics: Sci. Sys. (RSS), (2012)

[7]

La Valle S.M., Planning Algorithms, (2006)

[8]

Yucelen T., Yang B.-J., Calise A.J., Derivative-free decentralized adaptive control of large-scale interconnected uncertain systems, Proceeding of the 50th Conference on Decision and Control and European Control Conference (CDC-ECC), pp. 1104-1109, (2011)

[9]

Mehraeen S., Jagannathan S., Decentralized optimal control of a class of interconnected nonlinear discrete-time systems by using online Hamilton-Jacobi-Bellman formulation, IEEE Transactions on Neural Networks, 22, 11, pp. 1757-1769, (2011)

[10]

Dierks T., Jagannathan S., Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using timebased policy update, IEEE Transactions on Neural Networks and Learning Systems, 23, 7, pp. 1118-1129, (2012)

← 1 2 3 4 →