Population-Invariant MADRL for AoI-Aware UAV Trajectory Design and Communication Scheduling in Wireless Sensor Networks

被引:1
作者
Zhou, Xuanhan [1 ]
Xiong, Jun [1 ]
Zhao, Haitao [1 ]
Yan, Chao [2 ]
Wang, Haijun [1 ]
Wei, Jibo [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Hunan, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Automat Engn, Nanjing 211106, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Autonomous aerial vehicles; Trajectory; Wireless sensor networks; Training; Real-time systems; Internet of Things; Scalability; Scheduling; Data collection; Adaptation models; Age of Information (AoI); multiagent deep reinforcement learning (MADRL); population invariance; trajectory design; unmanned aerial vehicle (UAV); INFORMATION; AGE; INTERNET;
D O I
10.1109/JIOT.2024.3474926
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unmanned aerial vehicles (UAVs) are recognized as effective data collectors for wireless sensor networks. The Age of Information (AoI), a metric indicating data freshness, is crucial for decision making in time-sensitive applications. It can be significantly reduced by jointly optimizing UAV trajectories and communication scheduling of sensor nodes (SNs). However, rapid changes in the environment make it challenging to predesign UAV trajectories and communication scheduling decisions using traditional methods, especially when central controllers are absent and the numbers of UAVs and SNs vary. In this article, we propose hypernetwork-based QMIX (HyperQMIX), a population-invariant multiagent deep reinforcement learning (MADRL) algorithm capable of transferring policies across tasks with varying population sizes. First, we design neural network modules adaptable to varying input and output dimensions, facilitated by parameter generation through a hypernetwork. Then, HyperQMIX leverages these modules to process fluctuations in state and action dimensions. This approach ensures that the network structure remains consistent regardless of population sizes, thereby enhancing the algorithm's scalability. Extensive simulations demonstrate that HyperQMIX significantly outperforms state-of-the-art algorithms in terms of learning efficiency and converged performance. Moreover, agents pretrained with HyperQMIX perform well in tasks of different population sizes without additional training. Fine-tuning these models achieves performance comparable to training from scratch.
引用
收藏
页码:2545 / 2561
页数:17
相关论文
共 50 条
[1]   Average Peak Age-of-Information Minimization in UAV-Assisted IoT Networks [J].
Abd-Elmagid, Mohamed A. ;
Dhillon, Harpreet S. .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (02) :2003-2008
[2]  
Chen XY, 2023, IEEE T COMMUN, V71, P2475, DOI [10.1109/TIM.2023.3295011, 10.1109/TCOMM.2023.3244954]
[3]  
Chi Kai, 2022, 2022 IEEE 5th International Conference on Electronic Information and Communication Technology (ICEICT), P57, DOI 10.1109/ICEICT55736.2022.9909005
[4]  
Chung J., 2014, EMPIRICAL EVALUATION
[5]   Multi-UAV Path Learning for Age and Power Optimization in IoT With UAV Battery Recharge [J].
Eldeeb, Eslam ;
Sant'Ana, Jean Michel de Souza ;
Perez, Dian Echevarria ;
Shehab, Mohammad ;
Mahmood, Nurul Huda ;
Alves, Hirley .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) :5356-5360
[6]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[7]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[8]  
Ha D., 2017, INT C LEARN REPR
[9]  
Hao J., 2022, P INT C LEARN REPR I, P1
[10]   Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach [J].
He, Ying ;
Zhao, Nan ;
Yin, Hongxi .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (01) :44-55