Learn-as-you-go with Megh: Efficient Live Migration of Virtual Machines

被引：49

作者：

Basu, Debabrota ^{[1
]}

Wang, Xiayang ^{[2
]}

Hong, Yang ^{[2
]}

Chen, Haibo ^{[2
]}

Bressan, Stephane ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp, Dept Comp Sci, Singapore 119077, Singapore

[2] Shanghai Jiao Tong Univ, Inst Parallel & Distributed Syst, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2019年 / 30卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

Cloud computing; reinforcement learning; virtual machine; live migration; Markov decision process; energy efficiency; performance efficiency; ENERGY; CONSOLIDATION;

D O I：

10.1109/TPDS.2019.2893648

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Cloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain, and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledge-based algorithms, such as MaxWeight scheduling algorithms, are dependent on the specifics and the dynamics of the targeted Cloud architectures and applications. Heuristics-based algorithms, such as MMT algorithms, suffer from high variance and poor convergence because of their greedy approach. We propose an online reinforcement learning algorithm called Megh. Megh does not require prior knowledge of the workload rather learns the dynamics of workloads as-it-goes. Megh models the problem of energy-and performance-efficient resource management during live migration as a Markov decision process and solves it using a functional approximation scheme. While several reinforcement learning algorithms are proposed to solve this problem, these algorithms remain confined to the academic realm as they face the curse of dimensionality. They are either not scalable in real-time, as it is the case of MadVM, or need an elaborate offline training, as it is the case of Q-learning. These algorithms often incur execution overheads which are comparable with the migration time of a VM. Megh overcomes these deficiencies. Megh uses a novel dimensionality reduction scheme to project the combinatorially explosive state-action space to a polynomial dimensional space with a sparse basis. Megh has the capacity to learn uncertain dynamics and the ability to work in real-time without incurring significant execution overhead. Megh is both scalable and robust. We implement Megh using the CloudSim toolkit and empirically evaluate its performance with the PlanetLab and the Google Cluster workloads. Experiments validate that Megh is more cost-effective, converges faster, incurs smaller execution overhead and is more scalable than MadVM and MMT. An empirical sensitivity analysis explicates the choice of parameters in experiments.

引用

页码：1786 / 1801

页数：16

共 50 条

[1] Adaptive Resource Allocation and Provisioning in Multi-Service Cloud Environments [J].

Alsarhan, Ayoub ;

Itradat, Awni ;

Al-Dubai, Ahmed Y. ;

Zomaya, Albert Y. ;

Min, Geyong .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (01) :31-42

[2]

[Anonymous], 2014, Markov decision processes: discrete stochastic dynamic programming

[3]

[Anonymous], SOFTW PRACTICE EXP

[4]

[Anonymous], SPEC POWER PERFORMAN

[5]

[Anonymous], 2007, ELECT COOLING

[6]

[Anonymous], 1999, LAPACK USERS GUIDE

[7]

[Anonymous], 2003, ACM SIGOPS OPERATING

[8]

Atkinson KE, 2008, INTRO NUMERICAL ANAL

[9] Load Balancing Of Tasks In Cloud Computing Environment Based On Bee Colony Algorithm [J].

Babu, K. R. Remesh ;

Joy, Amaya Anna ;

Samuel, Philip .

2015 FIFTH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS (ICACC), 2015, :89-93

[10]

Baird L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P30

← 1 2 3 4 5 →