Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines

被引:22
作者
Canali, Claudia [1 ]
Lancellotti, Riccardo [1 ]
机构
[1] Univ Modena & Reggio Emilia, Dept Informat Engn, I-41125 Modena, Italy
关键词
cloud computing; resource monitoring; principal component analysis; k-means clustering; MANAGEMENT; PERFORMANCE; ALGORITHMS; SERVICES;
D O I
10.1007/s11390-013-1410-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.
引用
收藏
页码:38 / 52
页数:15
相关论文
共 32 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]   A comparison of extrinsic clustering evaluation metrics based on formal constraints [J].
Amigo, Enrique ;
Gonzalo, Julio ;
Artiles, Javier ;
Verdejo, Felisa .
INFORMATION RETRIEVAL, 2009, 12 (04) :461-486
[3]  
Andreolini M., 2011, Proceedings of the 2011 IEEE 11th International Conference on Computer and Information Technology (CIT 2011), P389, DOI 10.1109/CIT.2011.62
[4]   A Scalable Architecture for Real-Time Monitoring of Large Information Systems [J].
Andreolini, Mauro ;
Colajanni, Michele ;
Pietri, Marcello .
2012 IEEE SECOND SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS (NCCA 2012), 2012, :143-150
[5]  
[Anonymous], 2010, 8 INT WORKSH MIDDL G
[6]  
[Anonymous], 2008, Introduction to information retrieval
[7]  
[Anonymous], 2007, USENIX C NETW SYST D
[8]  
[Anonymous], 2007, P 16 INT C WORLD WID
[9]   Energy-Aware Autonomic Resource Allocation in Multitier Virtualized Environments [J].
Ardagna, Danilo ;
Panicucci, Barbara ;
Trubian, Marco ;
Zhang, Li .
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2012, 5 (01) :2-19
[10]  
Canali Claudia., 2012, COMMUNICATIONS SOFTW, V8, P102, DOI DOI 10.24138/JCOMSS.V8I4.164