A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs

被引：0

作者：

Wang, Chang ^{[1
]}

Yan, Chao ^{[1
]}

Xiang, Xiaojia ^{[1
]}

Zhou, Han ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

来源：

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101 | 2019年 / 101卷

基金：

中国国家自然科学基金;

关键词：

unmanned aerial vehicle (UAV); flocking; reinforcement learning; actor-critic; experience replay; AGENTS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.

引用

页码：64 / 79

页数：16

共 18 条

[1] Hou YN, 2017, IEEE SYS MAN CYBERN, P316, DOI 10.1109/SMC.2017.8122622
[2] Multirobot Cooperative Learning for Predator Avoidance
Hung Manh La
Lim, Ronny
Sheng, Weihua
[J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2015, 23 (01) : 52 - 63
[3] A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment
Hung, Shao-Ming
Givigi, Sidney N.
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (01) : 186 - 197
[4] A Dyna-Q(λ) Approach to Flocking with Fixed -Wing UAVs in a Stochastic Environment
Hung, Shao-Ming
Givigi, Sidney
Noureldin, Aboelmagd
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1918 - 1923
[5] Kingma DP, 2014, ARXIV
[6] Koichiro Morihiro, 2006, INT JOINT C SICE ICA, P4551
[7] Lillicrap T, 2015, arXiv, DOI DOI 10.48550/ARXIV.1509.02971
[8] SELF-IMPROVING REACTIVE AGENTS BASED ON REINFORCEMENT LEARNING, PLANNING AND TEACHING
LIN, LJ
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 293 - 321
[9] A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles
Ma, Zhaowei
Wang, Chang
Niu, Yifeng
Wang, Xiangke
Shen, Lincheng
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 100 : 108 - 118
[10] Human-level control through deep reinforcement learning
Mnih, Volodymyr
Kavukcuoglu, Koray
Silver, David
Rusu, Andrei A.
Veness, Joel
Bellemare, Marc G.
Graves, Alex
Riedmiller, Martin
Fidjeland, Andreas K.
Ostrovski, Georg
Petersen, Stig
Beattie, Charles
Sadik, Amir
Antonoglou, Ioannis
King, Helen
Kumaran, Dharshan
Wierstra, Daan
Legg, Shane
Hassabis, Demis
[J]. NATURE, 2015, 518 (7540) : 529 - 533

← 1 2 →