State Monitoring in Cloud Datacenters

被引:23
作者
Meng, Shicong [1 ]
Liu, Ling [2 ]
Wang, Ting [3 ]
机构
[1] Georgia Inst Technol, Coll Comp, GT Stn 37975, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Coll Comp, KACB, Atlanta, GA 30332 USA
[3] Georgia Inst Technol, Coll Comp, Georgia Tech Stn 329544, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
State monitoring; datacenter; cloud; distributed; aggregation; tuning;
D O I
10.1109/TKDE.2011.70
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monitoring global states of a distributed cloud application is a critical functionality for cloud datacenter management. State monitoring requires meeting two demanding objectives: high level of correctness, which ensures zero or low error rate, and high communication efficiency, which demands minimal communication cost in detecting state updates. Most existing work follows an instantaneous model which triggers state alerts whenever a constraint is violated. This model may cause frequent and unnecessary alerts due to momentary value bursts and outliers. Countermeasures of such alerts may further cause problematic operations. In this paper, we present a WIndow-based StatE monitoring (WISE) framework for efficiently managing cloud applications. Window-based state monitoring reports alerts only when state violation is continuous within a time window. We show that it is not only more resilient to value bursts and outliers, but also able to save considerable communication when implemented in a distributed manner based on four technical contributions. First, we present the architectural design and deployment options for window-based state monitoring with centralized parameter tuning. Second, we develop a new distributed parameter tuning scheme enabling WISE to scale to much more monitoring nodes as each node tunes its monitoring parameters reactively without global information. Third, we introduce two optimization techniques, including their design rationale, correctness and usage model, to further reduce the communication cost. Finally, we provide an in-depth empirical study of the scalability of WISE, and evaluate the improvement brought by the distributed tuning scheme and the two performance optimizations. Our results show that WISE reduces communication by 50-90 percent compared with instantaneous monitoring approaches, and the improved WISE gains a clear scalability advantage over its centralized version.
引用
收藏
页码:1328 / 1344
页数:17
相关论文
共 50 条
  • [41] Monitoring as a Service for Cloud Environments
    Mueller, Julius
    Palma, David
    Landi, Giada
    Soares, Joao
    Parreira, Bruno
    Metsch, Thijs
    Gray, Peter
    Georgiev, Alexander
    Ai-Hazmi, Yahya
    Magedanz, Thomas
    Simoes, Paulo
    2014 IEEE FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2014, : 174 - 179
  • [42] Thermal camera networks for large datacenters using real-time thermal monitoring mechanism
    Hang Liu
    Eun Kyung Lee
    Dario Pompili
    Xiangwei Kong
    The Journal of Supercomputing, 2013, 64 : 383 - 408
  • [43] Enabling Operational Data Analytics for Datacenters through Ontologies, Monitoring, and Simulation-based Prediction
    Suman, Shekhar
    Chu, Xiaoyu
    Niewenhuis, Dante
    Talluri, Sacheendra
    De Matteis, Tiziano
    Iosup, Alexandru
    COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 120 - 126
  • [44] Thermal camera networks for large datacenters using real-time thermal monitoring mechanism
    Liu, Hang
    Lee, Eun Kyung
    Pompili, Dario
    Kong, Xiangwei
    JOURNAL OF SUPERCOMPUTING, 2013, 64 (02) : 383 - 408
  • [45] Research on multi-state monitoring system of substation equipment based on edge-cloud collaboration
    Jiang Y.
    Liu Z.
    Wang W.
    Zhou W.
    Xu H.
    Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2021, 49 (06): : 138 - 144
  • [46] Self-adaptive service level agreement monitoring in cloud environments
    Clark, Kassidy
    Warnier, Martijn
    Brazier, Frances
    MULTIAGENT AND GRID SYSTEMS, 2013, 9 (02) : 135 - 155
  • [47] App-Centric and Environment-Aware Monitoring and Diagnosis in the Cloud
    Carvalho, Tiago
    Kim, Hyong S.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [48] Multiple-cloud platform monitoring
    Vicic, Jernej
    Brodnik, Andrej
    ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2014, 81 (03): : 94 - 100
  • [49] Multidimensional cloud latency monitoring and evaluation
    Tomanek, Ondrej
    Mulinka, Pavol
    Kencl, Lukas
    COMPUTER NETWORKS, 2016, 107 : 104 - 120
  • [50] Constraint aware profit maximization scheduling of tasks in heterogeneous datacenters
    Swain, Chinmaya Kumar
    Gupta, Bhawana
    Sahu, Aryabartta
    COMPUTING, 2020, 102 (10) : 2229 - 2255