A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment

被引:174
作者
Hung, Shao-Ming [1 ]
Givigi, Sidney N. [1 ]
机构
[1] Royal Mil Coll Canada, Dept Elect & Comp Engn, Kingston, ON K7K 7B4, Canada
关键词
Flocking; Q-learning; reinforcement learning (RL); unmanned aerial vehicles (UAVs); REINFORCEMENT; COORDINATION; ROBOTICS; DYNAMICS; SERVICES; AGENTS;
D O I
10.1109/TCYB.2015.2509646
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(lambda) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.
引用
收藏
页码:186 / 197
页数:12
相关论文
共 59 条
[1]   Interoperable and Adaptive Fuzzy Services for Ambient Intelligence Applications [J].
Acampora, Giovanni ;
Gaeta, Matteo ;
Loia, Vincenzo ;
Vasilakos, Athanasios V. .
ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2010, 5 (02)
[2]  
Adamey E., 2012, P SPIE GROUND AIR MU, V8389, P15
[3]  
[Anonymous], 2012, P AAAI C ART INT, DOI DOI 10.1609/AAAI.V26I1.8313
[4]  
[Anonymous], PALADYN J BEHAV ROB
[5]  
[Anonymous], 2020, Reinforcement Learning, An Introduction
[6]  
Basso EW., 2006, P 23 INT C MACH LEAR, P217, DOI DOI 10.1145/1143844.1143872
[7]  
Beard R. W., 2012, SMALL UNMANNED AIRCR
[8]  
Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[9]  
Campo A, 2006, LECT NOTES COMPUT SC, V4150, P191
[10]   Cooperative mobile robotics: Antecedents and directions [J].
Cao, YU ;
Fukunaga, AS ;
Kahng, AB .
AUTONOMOUS ROBOTS, 1997, 4 (01) :7-27