DeepNap: Data-Driven Base Station Sleeping Operations Through Deep Reinforcement Learning

被引：72

作者：

Liu, Jingchu ^{[1
,2
]}

Krishnamachari, Bhaskar ^{[3
]}

Zhou, Sheng ^{[1
]}

Niu, Zhisheng ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Horizon Robot, Smart Driving Div, Beijing 100089, Peoples R China

[3] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

IEEE INTERNET OF THINGS JOURNAL | 2018年 / 5卷 / 06期

关键词：

Base station (BS) sleeping; deep Q-network (DQN); deep reinforcement learning (RL); nonstationary traffic; ENERGY-DELAY TRADEOFFS; INTERNET; NETWORKS; THINGS; IOT;

D O I：

10.1109/JIOT.2018.2846694

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Base station (BS) sleeping is an effective way to reduce the energy consumption of mobile networks. Previous efforts to design sleeping control algorithms mainly rely on stochastic traffic models and analytical derivation. However, the tractability of models often conflicts with the complexity of real-world traffic, making it difficult to apply in reality. In this paper, we propose a data-driven algorithm for dynamic sleeping control called DeepNap. This algorithm uses a deep Q-network (DQN) to learn effective sleeping policies from high-dimensional raw observations or un-quantized systems state vectors. We propose to enhance the original DQN algorithm with action-wise experience replay and adaptive reward scaling to deal with the challenges in nonstationary traffic. We also provide a model-assisted variant of DeepNap through the Dyna framework for inferring and simulating system dynamics. Periodical traffic modeling makes it possible to capture the nonstationarity in real-world traffic and the incorporation with DQN allows for feature learning and generalization from model outputs. Experiments show that both the end-to-end and the model-assisted version of DeepNap outperform table-based Q-learning algorithm and the nonstationarity enhancements improve the stability of vanilla DQN.

引用

页码：4273 / 4282

页数：10

共 34 条

[1]

Marsan MA, 2009, 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOPS, VOLS 1 AND 2, P438

[2] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].

BAUM, LE ;

PETRIE, T ;

SOULES, G ;

WEISS, N .

ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&

[3] A Survey on Machine-Learning Techniques in Cognitive Radios [J].

Bkassiny, Mario ;

Li, Yang ;

Jayaweera, Sudharman K. .

IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2013, 15 (03) :1136-1159

[4] Narrowband Internet of Things: Implementations and Applications [J].

Chen, Jiming ;

Hu, Kang ;

Wang, Qi ;

Sun, Yuyi ;

Shi, Zhiguo ;

He, Shibo .

IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (06) :2309-2314

[5] Mobile Big Data: The Fuel for Data-Driven Wireless [J].

Cheng, Xiang ;

Fang, Luoyang ;

Yang, Liuqing ;

Cui, Shuguang .

IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (05) :1489-1516

[6] Exploiting Mobile Big Data: Sources, Features, and Applications [J].

Cheng, Xiang ;

Fang, Luoyang ;

Hong, Xuemin ;

Yang, Liuqing .

IEEE NETWORK, 2017, 31 (01) :72-79

[7] THE MARKOV-MODULATED POISSON-PROCESS (MMPP) COOKBOOK [J].

FISCHER, W ;

MEIERHELLSTERN, K .

PERFORMANCE EVALUATION, 1993, 18 (02) :149-171

[8] Distributed Q-Learning for Aggregated Interference Control in Cognitive Radio Networks [J].

Galindo-Serrano, Ana ;

Giupponi, Lorenza .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2010, 59 (04) :1823-1834

[9]

Glorot X, 2010, P 13 INT C ART INT S, P249, DOI DOI 10.1109/LGRS.2016.2565705

[10]

Guo Xueying., 2013, Proceedings of the 2013 25th International Teletraffic Congress (ITC), P1

← 1 2 3 4 →