Murphy: Performance Diagnosis of Distributed Cloud Applications

被引:1
|
作者
Harsh, Vipul [1 ,2 ]
Zhou, Wenxuan [2 ]
Ashok, Sachin [1 ]
Mysore, Radhika Niranjan [3 ]
Godfrey, P. Brighten [1 ,2 ]
Banerjee, Sujata [3 ]
机构
[1] Univ Illinois, Chicago, IL 60680 USA
[2] VMware, Palo Alto, CA 94304 USA
[3] VMware Res, Palo Alto, CA USA
来源
PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023 | 2023年
关键词
performance diagnosis; cyclic dependencies; enterprise networks; microservices;
D O I
10.1145/3603269.3604877
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern cloud-based applications have complex inter-dependencies on both distributed application components as well as network infrastructure, making it difficult to reason about their performance. As a result, a rich body of work seeks to automate performance diagnosis of enterprise networks and such cloud applications. However, existing methods either ignore inter-dependencies which results in poor accuracy, or require causal acyclic dependencies which cannot model common enterprise environments. We describe the design and implementation of Murphy, an automated performance diagnosis system, that can work with commonly available telemetry in practical enterprise environments, while achieving high accuracy. Murphy utilizes loosely-defined associations between entities obtained from commonly available monitoring data. Its learning algorithm is based on a Markov Random Field (MRF) that can take advantage of such loose associations to reason about how entities affect each other in the context of a specific incident. We evaluate Murphy in an emulated microservice environment and in real incidents from a large enterprise. Compared to past work, Murphy is able to reduce diagnosis error by approximate to 1.35x in restrictive environments supported by past work, and by >= 4.7x in more general environments.
引用
收藏
页码:438 / 451
页数:14
相关论文
共 50 条
  • [1] Performance Diagnosis in Cloud Microservices Using Deep Learning
    Wu, Li
    Bogatinovski, Jasmin
    Nedelkoski, Sasho
    Tordsson, Johan
    Kao, Odej
    SERVICE-ORIENTED COMPUTING, ICSOC 2020, 2021, 12632 : 85 - 96
  • [2] A fine-grained robust performance diagnosis framework for run-time cloud applications
    Xin, Ruyue
    Chen, Peng
    Grosso, Paola
    Zhao, Zhiming
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 155 : 300 - 311
  • [3] Using a Progressive Model for Performance Diagnosis of Distributed System
    Tung, Yuan-Hsin
    Tseng, Shian-Shyong
    Shih, Pin-Zei
    ICIEA 2010: PROCEEDINGS OF THE 5TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOL 1, 2010, : 449 - +
  • [4] PERFORMANCE EVALUATION OF MASSIVELY DISTRIBUTED MICROSERVICES BASED APPLICATIONS
    Gribaudo, Marco
    Iacono, Mauro
    Manini, Daniele
    PROCEEDINGS - 31ST EUROPEAN CONFERENCE ON MODELLING AND SIMULATION ECMS 2017, 2017, : 598 - 604
  • [5] Automatic performance diagnosis of parallel applications on heterogeneous systems
    Zhan, Kunlin
    Xu, Jungang
    Zhan, Jianfeng
    International Journal of Digital Content Technology and its Applications, 2012, 6 (02) : 1 - 9
  • [6] On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach
    de Nardin, Igor Fontana
    Righi, Rodrigo da Rosa
    Lima Lopes, Thiago Roberto
    da Costa, Cristiano Andre
    Yeom, Heon Young
    Koestler, Harald
    PARALLEL COMPUTING, 2021, 108
  • [7] A Resource Contention Analysis Framework for Diagnosis of Application Performance Anomalies in Consolidated Cloud Environments
    Matsuki, Tatsuma
    Matsuoka, Naoki
    PROCEEDINGS OF THE 2016 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE'16), 2016, : 173 - 184
  • [8] CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment
    Chen, Pengfei
    Qi, Yong
    Hou, Di
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2019, 12 (02) : 214 - 230
  • [9] Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems
    Mi, Haibo
    Wang, Huaimin
    Zhou, Yangfan
    Lyu, Michael Rung-Tsong
    Cai, Hua
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (06) : 1245 - 1255
  • [10] Architecting Enterprise Applications for the Cloud: The Unicorn Universe Cloud Framework
    Beranek, Marek
    Stastny, Marek
    Kovar, Vladimir
    Feuerlicht, George
    SERVICE-ORIENTED COMPUTING - ICSOC 2017 WORKSHOPS, 2018, 10797 : 259 - 270