Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

被引:0
作者
Lidong ZHANG [1 ,2 ]
Ban WANG [2 ]
Zhixiang LIU [2 ]
Youmin ZHANG [2 ]
Jianliang AI [1 ]
机构
[1] Department of Aeronautics and Astronautics, Fudan University
[2] Department of Mechanical, Industrial and Aerospace Engineering, Concordia University
关键词
Reinforcement learning; Approximate dynamic programming; Decision making; Motion planning; Unmanned aerial vehicle;
D O I
暂无
中图分类号
TP242 [机器人];
学科分类号
1111 ;
摘要
Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties,long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the threedimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles(NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.
引用
收藏
页码:525 / 537
页数:13
相关论文
共 28 条
[1]  
Beyond the one step greedy approach in reinforcement learning. Efroni Y,Dalal G,Scherrer B,et al. https://arxiv.org/abs/1802.03654 . 2018
[2]  
Reinforcement Learning: An Introduction. Sutton RS,Barto AG. . 1998
[3]  
On the Theory of Dynamic Programming. Bellman,R. Proceedings of the National Academy of Sciences of the United States of America . 1952
[4]  
Incremental gradient,subgradient,and proximal methods for convex optimization:a survey. BERTSEKAS D P. Optimization for Machine Learning . 2012
[5]  
Online least-squares policy iteration for reinforcement learning control. L. Busoniu,D. Ernst,B. D. Schutter,R. Babuska. Proceedings 2010 American Control Conference . 2010
[6]  
Least-squaresλpolicy iteration:bias-variance trade-off in control problems. Thiery C,Scherrer B. Proc 27th Int Conf on Machine Learning . 2010
[7]  
Unmanned surface vehicles:an overview of developments and challenges. Liu Z,Zhang Y,Yu X,et al. Annual Reviews in Control . 2016
[8]  
Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances[J] . Ban Wang,Xiang Yu,Lingxia Mu,Youmin Zhang. &nbspMechanical Systems and Signal Processing . 2019
[9]  
A Case Study on Air Combat Decision Using Approximated Dynamic Programming[J] . Yaofei Ma,Xiaole Ma,Xiao Song,Minrui Fei. &nbspMathematical Problems in Engineering . 2014
[10]  
Lambda-policy iteration:a review and a new implementation. Bertsekas DP. https://arxiv.org/abs/1507.01029 . 2015