Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

被引:12
作者
Jeon, Sangwoo [1 ]
Lee, Hoeun [1 ]
Kaliappan, Vishnu Kumar [2 ]
Nguyen, Tuan Anh [2 ]
Jo, Hyungeun [1 ]
Cho, Hyeonseo [1 ]
Min, Dugki [1 ]
机构
[1] Konkuk Univ, Dept Comp Sci & Engn, Seoul 05029, South Korea
[2] Konkuk Univ, Konkuk Aerosp Design Airworthiness Res Inst, Seoul 05029, South Korea
基金
新加坡国家研究基金会;
关键词
air logistics; multiagent reinforcement learning; actor-attention-critic; sensor fusion; multiple UAV;
D O I
10.3390/en15197426
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
The proliferation of unmanned aerial vehicles (UAVs) has spawned a variety of intelligent services, where efficient coordination plays a significant role in increasing the effectiveness of cooperative execution. However, due to the limited operational time and range of UAVs, achieving highly efficient coordinated actions is difficult, particularly in unknown dynamic environments. This paper proposes a multiagent deep reinforcement learning (MADRL)-based fusion-multiactor-attention-critic (F-MAAC) model for multiple UAVs' energy-efficient cooperative navigation control. The proposed model is built on the multiactor-attention-critic (MAAC) model, which offers two significant advances. The first is the sensor fusion layer, which enables the actor network to utilize all required sensor information effectively. Next, a layer that computes the dissimilarity weights of different agents is added to compensate for the information lost through the attention layer of the MAAC model. We utilize the UAV LDS (logistic delivery service) environment created by the Unity engine to train the proposed model and verify its energy efficiency. The feature that measures the total distance traveled by the UAVs is incorporated with the UAV LDS environment to validate the energy efficiency. To demonstrate the performance of the proposed model, the F-MAAC model is compared with several conventional reinforcement learning models with two use cases. First, we compare the F-MAAC model to the DDPG, MADDPG, and MAAC models based on the mean episode rewards for 20k episodes of training. The two top-performing models (F-MAAC and MAAC) are then chosen and retrained for 150k episodes. Our study determines the total amount of deliveries done within the same period and the total amount done within the same distance to represent energy efficiency. According to our simulation results, the F-MAAC model outperforms the MAAC model, making 38% more deliveries in 3000 time steps and 30% more deliveries per 1000 m of distance traveled.
引用
收藏
页数:18
相关论文
共 29 条
[1]   Comprehensive Energy Consumption Model for Unmanned Aerial Vehicles, Based on Empirical Studies of Battery Performance [J].
Abeywickrama, Hasini Viranga ;
Jayawickrama, Beeshanga Abewardana ;
He, Ying ;
Dutkiewicz, Eryk .
IEEE ACCESS, 2018, 6 :58383-58394
[2]  
[Anonymous], OpenAI Gym
[3]   Multi-UAV Mobile Edge Computing and Path Planning Platform Based on Reinforcement Learning [J].
Chang, Huan ;
Chen, Yicheng ;
Zhang, Baochang ;
Doermann, David .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (03) :489-498
[4]  
Feng LF, 2020, Arxiv, DOI arXiv:2004.09864
[5]  
Glorennec P Y., 2000, ESIT, P14
[6]  
Iqbal S, 2019, PR MACH LEARN RES, V97
[7]  
Jeon S., 2022, P INT VIRTUAL C IND
[8]  
Roldán JJ, 2015, MED C CONTR AUTOMAT, P1, DOI 10.1109/MED.2015.7158721
[9]  
Jo H., 2021, P ASIA PACIFIC INT S
[10]  
Lillicrap TP., 2015, ARXIV