App-Centric and Environment-Aware Monitoring and Diagnosis in the Cloud

被引:0
作者
Carvalho, Tiago [1 ]
Kim, Hyong S. [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC) | 2017年
关键词
distributed; monitoring; cloud; integrated diagnosis;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Infrastructure-as-a-Service environments are becoming increasingly popular. When there is a failure, many applications require service restoration within a few seconds. Reaction to failures in Cloud is still slow for many applications. Monitoring is limited to instance metrics that are not conducive to precise diagnosis due to complexity of virtualization in physical hosts. Interferences among different VMs complicates the diagnosis. We propose a new dynamic monitoring module as a part of multi-agent based cloud management framework named LAMA. Applications are at the center of our framework. Agents distributed throughout the cloud infrastructure are responsible for aggregating metrics and performing customized diagnostics for each application. This reduces overhead, removes centralized bottlenecks and allows customized configuration with finer granularities. Our approach is also environment aware as each app agent has access to the application, virtual instances and hosting infrastructure metrics. This feature enables the creation of more efficient diagnostics algorithms customized to application's needs. We develop and deploy LAMA in our datacenter to demonstrate (1) how an integrated approach with access to the state of the app's improves efficiency of failure detection, (2) how our monitoring and diagnosing architecture can improve load distribution in the network and (3) the impact of finer granularity on failure detection time.
引用
收藏
页数:7
相关论文
共 13 条
[1]  
Chengwei Wang, 2013, Operating Systems Review, V47, P50
[2]  
Da Cunha Rodrigues G., 2016, Em: abr, P378, DOI DOI 10.1145/2851613.2851619
[3]   Quasar: Resource-Efficient and QoS-Aware Cluster Management [J].
Delimitrou, Christina ;
Kozyrakis, Christos .
ACM SIGPLAN NOTICES, 2014, 49 (04) :127-143
[4]  
Feller E., 2012, Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), P482, DOI 10.1109/CCGrid.2012.71
[5]  
Jin Shao, 2010, 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD 2010), P313, DOI 10.1109/CLOUD.2010.31
[6]   Efficient constraint monitoring using adaptive thresholds [J].
Kashyap, Srinivas ;
Ramamirtham, Jeyashankher ;
Rastogi, Rajeev ;
Shukla, Pushpraj .
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, :526-+
[7]   Monitoring large systems via statistical sampling [J].
Mendes, CL ;
Reed, DA .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2004, 18 (02) :267-277
[8]  
Meng S., 2013, ICDCS
[9]   State Monitoring in Cloud Datacenters [J].
Meng, Shicong ;
Liu, Ling ;
Wang, Ting .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1328-1344
[10]  
Nathuji R, 2010, EUROSYS'10: PROCEEDINGS OF THE EUROSYS 2010 CONFERENCE, P237