Energy Minimization in UAV-Aided Networks: Actor-Critic Learning for Constrained Scheduling Optimization

被引:35
作者
Yuan, Yaxiong [1 ]
Lei, Lei [1 ]
Vu, Thang X. [1 ]
Chatzinotas, Symeon [1 ]
Sun, Sumei [2 ]
Ottersten, Bjorn [1 ]
机构
[1] Luxembourg Univ, Interdisciplinary Ctr Secur Reliabil & Trust, L-1855 Kirchberg, Luxembourg
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
关键词
Optimization; Trajectory; Heuristic algorithms; Unmanned aerial vehicles; Resource management; Propulsion; Task analysis; UAV; deep reinforcement learning; user scheduling; hovering time allocation; energy optimization; actor-critic; TRAJECTORY OPTIMIZATION; RESOURCE-ALLOCATION; FAIR COMMUNICATION; EFFICIENT; CHANNEL; DESIGN; RADIO;
D O I
10.1109/TVT.2021.3075860
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In unmanned aerial vehicle (UAV) applications, the UAV's limited energy supply and storage have triggered the development of intelligent energy-conserving scheduling solutions. In this paper, we investigate energy minimization for UAV-aided communication networks by jointly optimizing data-transmission scheduling and UAV hovering time. The formulated problem is combinatorial and non-convex with bilinear constraints. To tackle the problem, firstly, we provide an optimal algorithm (OPT) and a golden section search heuristic algorithm (GSS-HEU). Both solutions are served as offline performance benchmarks which might not be suitable for online operations. Towards this end, from a deep reinforcement learning (DRL) perspective, we propose an actor-critic-based deep stochastic online scheduling (AC-DSOS) algorithm and develop a set of approaches to confine the action space. Compared to conventional RL/DRL, the novelty of AC-DSOS lies in handling two major issues, i.e., exponentially-increased action space and infeasible actions. Numerical results show that AC-DSOS is able to provide feasible solutions, and save around 25-30% energy compared to two conventional deep AC-DRL algorithms. Compared to the developed GSS-HEU, AC-DSOS consumes around 10% higher energy but reduces the computational time from second-level to millisecond-level.
引用
收藏
页码:5028 / 5042
页数:15
相关论文
共 49 条
[1]   Energy-Efficient UAV-to-User Scheduling to Maximize Throughput in Wireless Networks [J].
Ahmed, Shakil ;
Chowdhury, Mostafa Zaman ;
Jang, Yeong Min .
IEEE ACCESS, 2020, 8 (08) :21215-21225
[2]  
Cao Y., 2019, ARXIV190411492, P1, DOI DOI 10.1109/ICCVW.2019.00246
[3]   LEARN TO CACHE: MACHINE LEARNING FOR NETWORK EDGE CACHING IN THE BIG DATA ERA [J].
Chang, Zheng ;
Lei, Lei ;
Zhou, Zhenyu ;
Mao, Shiwen ;
Ristaniemi, Tapani .
IEEE WIRELESS COMMUNICATIONS, 2018, 25 (03) :28-35
[4]   3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach [J].
Ding, Ruijin ;
Gao, Feifei ;
Shen, Xuemin .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (12) :7796-7809
[5]   Coarse Trajectory Design for Energy Minimization in UAV-Enabled [J].
Dinh-Hieu Tran ;
Vu, Thang X. ;
Chatzinotas, Symeon ;
ShahbazPanahi, Shahram ;
Ottersten, Bjorn .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (09) :9483-9496
[6]  
Dulac-Arnold G, 2021, MACH LEARN, V110, P1
[7]  
Filippone A., 2006, Flight performance of fixed and rotary wing aircraft
[8]   Search for Global Maxima in Multimodal Functions by Applying Numerical Optimization Algorithms: A Comparison between Golden Section and Simulated Annealing [J].
Guillot, Jordan ;
Restrepo-Leal, Diego ;
Robles-Algarin, Carlos ;
Oliveros, Ingrid .
COMPUTATION, 2019, 7 (03)
[9]  
He Ji, 2016, P C EMP METH NAT LAN, P1838
[10]   Deep-Reinforcement-Learning-Based Optimization for Cache-Enabled Opportunistic Interference Alignment Wireless Networks [J].
He, Ying ;
Zhang, Zheng ;
Yu, F. Richard ;
Zhao, Nan ;
Yin, Hongxi ;
Leung, Victor C. M. ;
Zhang, Yanhua .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (11) :10433-10445