A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

被引:208
作者
Liu, Ning [1 ]
Li, Zhe [1 ]
Xu, Jielong [1 ]
Xu, Zhiyuan [1 ]
Lin, Sheng [1 ]
Qiu, Qinru [1 ]
Tang, Jian [1 ]
Wang, Yanzhi [1 ]
机构
[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
来源
2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017) | 2017年
关键词
Deep reinforcement learning; hierarchical framework; resource allocation; distributed algorithm; NEURAL-NETWORKS;
D O I
10.1109/ICDCS.2017.123
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner. Experiment results using actual Google cluster traces show that our proposed hierarchical framework significantly saves the power consumption and energy usage than the baseline while achieving no severe latency degradation. Meanwhile, the proposed framework can achieve the best trade-off between latency and power/ energy consumption in a server cluster.
引用
收藏
页码:372 / 382
页数:11
相关论文
共 36 条
[1]  
[Anonymous], ARXIV150701526
[2]  
[Anonymous], 1998, REINFORCEMENT LEARNI
[3]   Applying reinforcement learning towards automating resource allocation and application scalability in the cloud [J].
Barrett, Enda ;
Howley, Enda ;
Duggan, Jim .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (12) :1656-1674
[4]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[5]   A survey of design techniques for system-level dynamic power management [J].
Benini, L ;
Bogliolo, A ;
De Micheli, G .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (03) :299-316
[6]  
Bradtke S. J., 1995, Advances in Neural Information Processing Systems 7, P393
[7]   RECURRENT NEURAL NETWORKS AND ROBUST TIME-SERIES PREDICTION [J].
CONNOR, JT ;
MARTIN, RD ;
ATLAS, LE .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :240-254
[8]  
Dhiman G, 2006, IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, DIGEST OF TECHNICAL PAPERS, ICCAD, P1
[9]  
Dutreilh X, 2011, PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTONOMIC AND AUTONOMOUS SYSTEMS (ICAS 2011), P67
[10]  
Fan XB, 2007, CONF PROC INT SYMP C, P13, DOI 10.1145/1273440.1250665