Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems

被引:131
作者
Hu, Xin [1 ]
Zhang, Yuchen [1 ]
Liao, Xianglai [1 ]
Liu, Zhijun [1 ]
Wang, Weidong [1 ]
Ghannouchi, Fadhel M. [2 ,3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Elect Engn, Beijing 100876, Peoples R China
[2] Univ Calgary, Dept Elect & Comp Engn, Intelligent RF Radio Lab, Calgary, AB T2N 1N4, Canada
[3] Univ Calgary, Schulish Sch Engn, Calgary, AB T2N 1N4, Canada
基金
中国国家自然科学基金;
关键词
Satellite broadcasting; Delays; Throughput; Resource management; Reinforcement learning; Digital video broadcasting; Multi-beam satellite; beam hopping; differentiated services; deep reinforcement learning; multi-objective; multi-action selection; RESOURCE-MANAGEMENT; ALLOCATION; POWER; GAME;
D O I
10.1109/TBC.2019.2960940
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
When regarding the inherent uncertainty of differentiated services requirements as well as the non-uniform spatial distribution of capacity requests, it is essential to flexibility adjust resources of the satellite to satisfy the different conditions. How to match the system capacity demand with efficient utilization of beam is a brand-new challenge. The convention beam hopping methods ignores the intrinsic correlation between decisions, do not consider the long-term reward, and only achieve the optimal solution at the current time. Therefore, system complexity increases significantly as the increase of the demand for differentiated services or beam number. This paper investigates the optimal policy for beam hopping in DVB-S2X satellite with multiple purposes of assuring the fairness of each beam services, minimizing the delay of real-time services transmission, and maximizing the throughput of non-instant services transmission. Since wireless channel conditions, differentiated services arrival rates have stochastic properties, and the multi-beam satellite environment's dynamics are unknown, the model-free multi-objective deep reinforcement learning approach is used to learn the optimal policy through interactions with the situation. To solve the problem with action dimensional disaster, a novel multi-action selection method based on a Double-Loop Learning (DLL) is proposed. Moreover, the multi-dimensional state is reformulated and obtained by the deep neural network. Under realistic conditions achieving evaluation results demonstrate that the proposed method can pursue multiple objectives simultaneously, and it can also allocate resource intelligently adapting to the user requirements and channel conditions.
引用
收藏
页码:630 / 646
页数:17
相关论文
共 32 条
[1]  
Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]
[2]   Energy-efficient link resource allocation in the multibeam satellite downlink under QoS constraints [J].
Andreotti, Riccardo ;
Giannetti, Filippo ;
Luise, Marco .
INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING, 2016, 34 (05) :661-678
[3]  
Angeletti P., 2006, PROC 24 AIAA INT COM, P53
[4]  
[Anonymous], 2005, DIG VID BROADC DVB 2
[5]  
[Anonymous], 1997, Rec. S.672-4
[6]   Resource Management for Advanced Transmission Antenna Satellites [J].
Choi, Jihwan P. ;
Chan, Vincent W. S. .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2009, 8 (03) :1308-1321
[7]   Optimum power and beam allocation based on traffic demands and channel conditions over satellite downlinks [J].
Choi, JWP ;
Chan, VWS .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2005, 4 (06) :2983-2993
[8]   Radio Resource Management Optimization of Flexible Satellite Payloads for DVB-S2 Systems [J].
Cocco, Giuseppe ;
de Cola, Tomaso ;
Angelone, Martina ;
Katona, Zoltan ;
Erl, Stefan .
IEEE TRANSACTIONS ON BROADCASTING, 2018, 64 (02) :266-280
[9]   QoS-equilibrium slot allocation for beam hopping in broadband satellite communication systems [J].
Han, Han ;
Zheng, Xueqiang ;
Huang, Qinfei ;
Lin, Yuan .
WIRELESS NETWORKS, 2015, 21 (08) :2617-2630
[10]   Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach [J].
He, Ying ;
Zhao, Nan ;
Yin, Hongxi .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (01) :44-55