Kinodynamic Motion Planning With Continuous-Time Q-Learning: An Online, Model-Free, and Safe Navigation Framework

被引：63

作者：

Kontoudis, George P. ^{[1
]}

Vamvoudakis, Kyriakos G. ^{[2
]}

机构：

[1] Virginia Tech, Kevin T Crofton Dept Aerosp & Ocean Engn, Blacksburg, VA 24060 USA

[2] Georgia Tech, Daniel Guggenhe Sch Aerosp Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2019年 / 30卷 / 12期

关键词：

Planning; Heuristic algorithms; Optimal control; Dynamics; System dynamics; Computational modeling; Navigation; Actor; critic network; asymptotic optimality; online motion planning; Q-learning; LINEAR-SYSTEMS; DESIGN;

D O I：

10.1109/TNNLS.2019.2899311

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an online kinodynamic motion planning algorithmic framework using asymptotically optimal rapidly-exploring random tree (RRT) and continuous-time Q-learning, which we term as RRT-Q( star operator ). We formulate a model-free Q-based advantage function and we utilize integral reinforcement learning to develop tuning laws for the online approximation of the optimal cost and the optimal policy of continuous-time linear systems. Moreover, we provide rigorous Lyapunov-based proofs for the stability of the equilibrium point, which results in asymptotic convergence properties. A terminal state evaluation procedure is introduced to facilitate the online implementation. We propose a static obstacle augmentation and a local replanning framework, which are based on topological connectedness, to locally recompute the robot's path and ensure collision-free navigation. We perform simulations and a qualitative comparison to evaluate the efficacy of the proposed methodology.

引用

页码：3803 / 3817

页数：15

共 43 条

[1] The generalized Maxwell-slip model: A novel model for friction simulation and compensation [J].

Al-Bender, F ;

Lampaert, V ;

Swevers, J .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) :1883-1887

[2]

[Anonymous], 2016, AIAA GUIDANCE NAVIGA

[3]

[Anonymous], OPTIMAL ADAPTIVE CON

[4]

[Anonymous], 2019, ANN REV CONTROL ROBO

[5] GRADIENT DESCENT LEARNING ALGORITHM OVERVIEW - A GENERAL DYNAMICAL-SYSTEMS PERSPECTIVE [J].

BALDI, P .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01) :182-195

[6]

Berkenkamp F, 2015, 2015 EUROPEAN CONTROL CONFERENCE (ECC), P2496, DOI 10.1109/ECC.2015.7330913

[7]

Bryson A.E., 2018, Applied optimal control: optimization, estimation and control

[8]

Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f

[9] Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].

Chowdhary, Girish ;

Johnson, Eric .

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679

[10] KINODYNAMIC MOTION PLANNING [J].

DONALD, B ;

XAVIER, P ;

CANNY, J ;

REIF, J .

JOURNAL OF THE ACM, 1993, 40 (05) :1048-1066

← 1 2 3 4 5 →