Detecting performance anomalies in large-scale software systems using entropy

被引:1
|
作者
Malik, Haroon [1 ]
Shakshuki, Elhadi M. [2 ]
机构
[1] Marshall Univ, Weisberg Div Comp Sci, Huntington, WV 25755 USA
[2] Acadia Univ, Jodrey Sch Comp Sci, Wolfville, NS, Canada
关键词
Performance counters; Large-scale systems; Data center; Performance; Load test;
D O I
10.1007/s00779-017-1036-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale software systems (LSSs) are composed of hundreds of subsystems that interact with each other in an unforeseen and complex ways. The operators of these LSSs strictly monitor thousands of metrics (performance counters) to quickly identify performance anomalies before a catastrophe. The existing monitoring tools and methodologies have not kept in pace with the rapid growth and inherit complexity of these LSSs; hence are ineffective in assisting practitioners to effectively pinpoint performance anomalies. We propose two methodologies that use entropy measure to assist practitioners/operators of LSSs in quickly detecting both system-wide and underlying localized subsystem anomalies. Our performance tests conducted on an open-source benchmark system reveal that the proposed methodologies are robust in pinpointing anomalies, do not require any domain knowledge to operate, and avoid information overload on practitioners.
引用
收藏
页码:1127 / 1137
页数:11
相关论文
共 50 条
  • [41] Scale-free feature and evolving model of large-scale software systems
    Institute of Contemporary Manufacturing Engineering, Zhejiang University, Hangzhou 310027, China
    Wuli Xuebao, 2006, 8 (3799-3804):
  • [42] The scale-free feature and evolving model of large-scale software systems
    Yan Dong
    Qi Guo-Ning
    ACTA PHYSICA SINICA, 2006, 55 (08) : 3799 - 3804
  • [43] Detecting Anomaly in Large-scale Network using Mobile Crowdsourcing
    Li, Yang
    Sun, Jiachen
    Huang, Wenguang
    Tian, Xiaohua
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 2179 - 2187
  • [44] Scale and Responsiveness in Large-Scale Software Development
    Olsson, Helena Holmstrom
    Sandberg, Anna Borjesson
    Bosch, Jan
    Alahyari, Hiva
    IEEE SOFTWARE, 2014, 31 (05) : 87 - 93
  • [45] Towards Detecting Patterns in Failure Logs of Large-Scale Distributed Systems
    Gurumdimma, Nentawe
    Jhumka, Arshad
    Liakata, Maria
    Chuah, Edward
    Browne, James
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1052 - 1061
  • [46] TRANSCODE: Detecting Status Code Mapping Errors in Large-Scale Systems
    Tang, Wensheng
    Hu, Yikun
    Fan, Gang
    Yao, Peisen
    Wu, Rongxin
    Bai, Guangyuan
    Wang, Pengcheng
    Zhang, Charles
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 829 - 841
  • [47] Towards a performance management architecture for large-scale distributed systems using RINA
    Thompson, Peter
    Davies, Neil
    2020 23RD CONFERENCE ON INNOVATION IN CLOUDS, INTERNET AND NETWORKS AND WORKSHOPS (ICIN 2020), 2020, : 29 - 34
  • [48] Towards large-scale entropy computations
    Karamanos, K
    Kotsireas, I
    COMPUTING ANTICIPATORY SYSTEMS, 2004, 718 : 385 - 391
  • [49] Effectively Detecting Operational Anomalies In Large-Scale IoT Data Infrastructures By Using A GAN-Based Predictive Model
    Chen, Peng
    Liu, Hongyun
    Xin, Ruyue
    Carval, Thierry
    Zhao, Jiale
    Xia, Yunni
    Zhao, Zhiming
    COMPUTER JOURNAL, 2022, 65 (11): : 2909 - 2925
  • [50] Hybrid decentralized maximum entropy control for large-scale dynamical systems
    Haddad, Wassim M.
    Hui, Qing
    Chellaboina, VijaySekhar
    Nersesov, Sergey G.
    NONLINEAR ANALYSIS-HYBRID SYSTEMS, 2007, 1 (02) : 244 - 263