A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs

被引:0
作者
Wang, Chang [1 ]
Yan, Chao [1 ]
Xiang, Xiaojia [1 ]
Zhou, Han [1 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
来源
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101 | 2019年 / 101卷
基金
中国国家自然科学基金;
关键词
unmanned aerial vehicle (UAV); flocking; reinforcement learning; actor-critic; experience replay; AGENTS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.
引用
收藏
页码:64 / 79
页数:16
相关论文
共 18 条
  • [1] Hou YN, 2017, IEEE SYS MAN CYBERN, P316, DOI 10.1109/SMC.2017.8122622
  • [2] Multirobot Cooperative Learning for Predator Avoidance
    Hung Manh La
    Lim, Ronny
    Sheng, Weihua
    [J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2015, 23 (01) : 52 - 63
  • [3] A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment
    Hung, Shao-Ming
    Givigi, Sidney N.
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (01) : 186 - 197
  • [4] A Dyna-Q(λ) Approach to Flocking with Fixed -Wing UAVs in a Stochastic Environment
    Hung, Shao-Ming
    Givigi, Sidney
    Noureldin, Aboelmagd
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1918 - 1923
  • [5] Kingma DP, 2014, ARXIV
  • [6] Koichiro Morihiro, 2006, INT JOINT C SICE ICA, P4551
  • [7] Lillicrap T, 2015, arXiv, DOI DOI 10.48550/ARXIV.1509.02971
  • [9] A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles
    Ma, Zhaowei
    Wang, Chang
    Niu, Yifeng
    Wang, Xiangke
    Shen, Lincheng
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 100 : 108 - 118
  • [10] Human-level control through deep reinforcement learning
    Mnih, Volodymyr
    Kavukcuoglu, Koray
    Silver, David
    Rusu, Andrei A.
    Veness, Joel
    Bellemare, Marc G.
    Graves, Alex
    Riedmiller, Martin
    Fidjeland, Andreas K.
    Ostrovski, Georg
    Petersen, Stig
    Beattie, Charles
    Sadik, Amir
    Antonoglou, Ioannis
    King, Helen
    Kumaran, Dharshan
    Wierstra, Daan
    Legg, Shane
    Hassabis, Demis
    [J]. NATURE, 2015, 518 (7540) : 529 - 533