LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UAV-Assisted Sensor Network

被引:35
作者
Li, Kai [1 ]
Ni, Wei [2 ]
Dressler, Falko [3 ]
机构
[1] Real Time & Embedded Comp Syst Res Ctr CISTER, P-4249015 Porto, Portugal
[2] Commonwealth Sci & Ind Res Org, Digital Prod & Serv Flagship, Sydney, NSW 2122, Australia
[3] TU Berlin, Sch Elect Engn & Comp Sci, D-10587 Berlin, Germany
关键词
Wireless sensor networks; Batteries; Trajectory; Reinforcement learning; Resource management; Unmanned aerial vehicles; Trajectory planning; Deep deterministic policy gradient (DDPG); experimental data sets; flight trajectory; long short-term memory (LSTM); resource allocation; unmanned aerial vehicles (UAVs); WIRELESS; THROUGHPUT;
D O I
10.1109/JIOT.2021.3102831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unmanned aerial vehicles (UAVs) can be employed to collect sensory data in remote wireless sensor networks (WSNs). Due to UAV's maneuvering, scheduling a sensor device to transmit data can overflow data buffers of the unscheduled ground devices. Moreover, lossy airborne channels can result in packet reception errors at the scheduled sensor. This article proposes a new deep reinforcement learning-based flight resource allocation framework (DeFRA) to minimize the overall data packet loss in a continuous action space. DeFRA is based on deep deterministic policy gradient (DDPG), optimally controls instantaneous headings and speeds of the UAV, and selects the ground device for data collection. Furthermore, a state characterization layer, leveraging long short-term memory (LSTM), is developed to predict network dynamics, resulting from time-varying airborne channels and energy arrivals at the ground devices. To validate the effectiveness of DeFRA, experimental data collected from a real-world UAV testbed and energy harvesting WSN are utilized to train the actions of the UAV. Numerical results demonstrate that the proposed DeFRA achieves a fast convergence while reducing the packet loss by over 15%, as compared to the existing deep reinforcement learning solutions.
引用
收藏
页码:4179 / 4189
页数:11
相关论文
共 36 条
[1]   Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach [J].
Abedin, Sarder Fakhrul ;
Munir, Md Shirajum ;
Tran, Nguyen H. ;
Han, Zhu ;
Hong, Choong Seon .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (09) :5994-6006
[2]   Optimal LAP Altitude for Maximum Coverage [J].
Al-Hourani, Akram ;
Kandeepan, Sithamparanathan ;
Lardner, Simon .
IEEE WIRELESS COMMUNICATIONS LETTERS, 2014, 3 (06) :569-572
[3]  
Ashokkumar C. R., 2016, P AIAA GUID NAV CONT, P0643
[4]   Optimal UAV Route in Wireless Charging Sensor Networks [J].
Baek, Jaeuk ;
Han, Sang Ik ;
Han, Youngnam .
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (02) :1327-1335
[5]   Joint Position and Travel Path Optimization for Energy Efficient Wireless Data Gathering Using Unmanned Aerial Vehicles [J].
Ben Ghorbel, Mahdi ;
Rodriguez-Duarte, David ;
Ghazzai, Hakim ;
Hossain, Md. Jahangir ;
Menouar, Hamid .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (03) :2165-2175
[6]   Mean Field Deep Reinforcement Learning for Fair and Efficient UAV Control [J].
Chen, Dezhi ;
Qi, Qi ;
Zhuang, Zirui ;
Wang, Jingyu ;
Liao, Jianxin ;
Han, Zhu .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (02) :813-828
[7]   UAV-Assisted Data Collection With Nonorthogonal Multiple Access [J].
Chen, Weichao ;
Zhao, Shengjie ;
Zhang, Rongqing ;
Chen, Yi ;
Yang, Liuqing .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) :501-511
[8]   Joint Resources and Workflow Scheduling in UAV-Enabled Wirelessly-Powered MEC for IoT Systems [J].
Du, Yao ;
Yang, Kun ;
Wang, Kezhi ;
Zhang, Guopeng ;
Zhao, Yizhe ;
Chen, Dongwei .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (10) :10187-10200
[9]  
Emami Y., 2020, PROC IEEE VTC SPRING, P1
[10]   UAV Network and lot in the sky for Future Smart Cities [J].
Gi, Fei ;
Zhu, Xuetian ;
Mang, Ge ;
Kadoch, Michel ;
Li, Wei .
IEEE NETWORK, 2019, 33 (02) :96-101