Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

被引：13

作者：

Li, Bo ^{[1
]}

Liang, Shiyang ^{[1
]}

Gan, Zhigang ^{[1
]}

Chen, Daqing ^{[2
]}

Gao, Peixin ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[2] London South Bank Univ, Sch Engn, London SE1 0AA, England

来源：

INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION | 2021年 / 18卷 / 02期

关键词：

multi-UAV task decision; improved MADDPG algorithm; two-layer experience pool; transfer learning;

D O I：

10.1504/IJBIC.2021.118087

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

引用

页码：82 / 91

页数：10

共 28 条

[1] Cao X., 2019, China patent, Patent No. [CN109992000A, 109992000]
[2] Chen G., 2020, Journal of Ordnance Equipment Engineering, V41, P117
[3] Han X, 2019, INT CONF WIRE COMMUN
[4] Huan Liang, 2019, 2019 IEEE 19th International Conference on Communication Technology (ICCT), P1516, DOI 10.1109/ICCT46805.2019.8947072
[5] Huayue Chen, 2012, Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE 2012), P14, DOI 10.1109/ICCSEE.2012.501
[6] Jie Xu, 2019, 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), P538, DOI 10.1109/IAEAC47372.2019.8998066
[7] Li Y, 2018, ROUTE PLANNING MULTI, P1
[8] Lillicrap T. P., 2016, P INT C LEARN REPR S, P1
[9] Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design
Liu, Xiao
Liu, Yuanwei
Chen, Yue
[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (08) : 8036 - 8049
[10] Lowe R, 2017, ADV NEUR IN, V30

← 1 2 3 →