Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

被引:0
作者
Li-dong Zhang
Ban Wang
Zhi-xiang Liu
You-min Zhang
Jian-liang Ai
机构
[1] Fudan University,Department of Aeronautics and Astronautics
[2] Concordia University,Department of Mechanical, Industrial and Aerospace Engineering
来源
Frontiers of Information Technology & Electronic Engineering | 2019年 / 20卷
关键词
Reinforcement learning; Approximate dynamic programming; Decision making; Motion planning; Unmanned aerial vehicle; TP242;
D O I
暂无
中图分类号
学科分类号
摘要
Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties, long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the three-dimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles (NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.
引用
收藏
页码:525 / 537
页数:12
相关论文
共 26 条
[1]  
Ballard BW(1983)The *-minimax search procedure for trees containing chance nodes Artif Intell 21 327-350
[2]  
Bellman R(1952)On the theory of dynamic programming Proc Nat Acad Sci 38 716-719
[3]  
Bertsekas DP(2011)Temporal difference methods for general projected equations IEEE Trans Autom Contr 56 2128-2139
[4]  
Bertsekas DP(2000)Gradient convergence in gradient methods with errors SIAM J Optim 10 627-642
[5]  
Tsitsiklis JN(2016)Unmanned surface vehicles: an overview of developments and challenges Ann Rev Contr 41 71-93
[6]  
Liu ZX(2014)A case study on air combat decision using approximated dynamic programming Math Probl Eng 2014 183401-1654
[7]  
Zhang YM(2010)Air-combat strategy using approximate dynamic programming J Guid Contr Dynam 33 1641-745
[8]  
Yu X(2016)A distributed deployment strategy for a network of cooperative autonomous vehicles IEEE Trans Contr Syst Technol 23 737-4236
[9]  
Ma YF(2018)An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults IEEE Trans Ind Electron 65 4227-743
[10]  
Ma XL(2019)Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances Mech Syst Signal Process 120 727-3343