Content-Based Scheduling of Virtual Machines (VMs) in the Cloud

被引:27
作者
Bazarbayev, Sobir [1 ]
Hiltunen, Matti [2 ]
Joshi, Kaustubh [2 ]
Sanders, William H. [1 ]
Schlichting, Richard [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] AT&T Labs Res, Urbana, IL 61801 USA
来源
2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS) | 2013年
关键词
Scheduling; Virtualization; Data center; Cloud-computing;
D O I
10.1109/ICDCS.2013.15
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Organizations of all sizes are shifting their IT infrastructures to the cloud because of its cost efficiency and convenience. Because of the on-demand nature of the Infrastructure as a Service (IaaS) clouds, hundreds of thousands of virtual machines (VMs) may be deployed and terminated in a single large cloud data center each day. In this paper, we propose a content-based scheduling algorithm for the placement of VMs in data centers. We take advantage of the fact that it is possible to find identical disk blocks in different VM disk images with similar operating systems by scheduling VMs with high content similarity on the same hosts. That allows us to reduce the amount of data transferred when deploying a VM on a destination host. In this paper, we first present our study of content similarity between different VMs, based on a large set of VMs with different operating systems that represent the majority of popular operating systems in use today. Our analysis shows that content similarity between VMs with the same operating system and close version numbers (e.g., Ubuntu 12.04 vs. Ubuntu 11.10) can be as high as 60%. We also show that there is close to zero content similarity between VMs with different operating systems. Second, based on the above results, we designed a content-based scheduling algorithm that lowers the network traffic associated with transfer of VM disk images inside data centers. Our experimental results show that the amount of data transfer associated with deployment of VMs and transfer of virtual disk images can be lowered by more than 70%, resulting in significant savings in data center network utilization and congestion.
引用
收藏
页码:93 / 101
页数:9
相关论文
共 14 条
[1]  
Al-Kiswany S, 2011, HPDC 11: PROCEEDINGS OF THE 20TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, P159
[2]  
[Anonymous], 2009, P LINUX S, P19
[3]  
Bose Sumit Kumar, 2011, 2011 Proceedings of 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011), P13, DOI 10.1109/CCGrid.2011.16
[4]  
Broder Andrei, 2002, Internet mathematics, P636, DOI DOI 10.1080/15427951.2004.10129096
[5]  
Deshpande U, 2011, HPDC 11: PROCEEDINGS OF THE 20TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, P135
[6]   Difference Engine: Harnessing Memory Redundancy in Virtual Machines [J].
Gupta, Diwaker ;
Lee, Sangmin ;
Vrable, Michael ;
Savage, Stefan ;
Snoeren, Alex C. ;
Varghese, George ;
Voelker, Geoffrey M. ;
Vahdat, Amin .
COMMUNICATIONS OF THE ACM, 2010, 53 (10) :85-93
[7]  
Jin K., 2009, P SYSTOR 2009 ISR EX, p7:1, DOI DOI 10.1145/1534530.1534540
[8]  
Liu H., 2012, Amazon data center size
[9]  
Milos G., 2009, P 2009 C USENIX ANN
[10]  
Peng CY, 2012, IEEE INFOCOM SER, P181, DOI 10.1109/INFCOM.2012.6195556