Hora: Architecture-aware online failure prediction

被引:39
|
作者
Pitakrat, Teerat [1 ]
Okanovic, Dusan [1 ]
van Hoorn, Andre [1 ]
Grunske, Lars [2 ]
机构
[1] Univ Stuttgart, Inst Software Technol, Reliable Software Syst, Stuttgart, Germany
[2] Humboldt Univ, Dept Comp Sci, Software Engn, Berlin, Germany
关键词
Online failure prediction; Reliability; Component-based software systems; ERROR PROPAGATION; RELIABILITY; MODEL;
D O I
10.1016/j.jss.2017.02.041
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Complex software systems experience failures at runtime even though a lot of effort is put into the development and operation. Reactive approaches detect these failures after they have occurred and already caused serious consequences. In order to execute proactive actions, the goal of online failure prediction is to detect these failures in advance by monitoring the quality of service or the system events. Current failure prediction approaches look at the system or individual components as a monolith without considering the architecture of the system. They disregard the fact that the failure in one component can propagate through the system and cause problems in other components. In this paper, we propose a hierarchical online failure prediction approach, called HORA, which combines component failure predictors with architectural knowledge. The failure propagation is modeled using Bayesian networks which incorporate both prediction results and component dependencies extracted from the architectural models. Our approach is evaluated using Netflix's server-side distributed RSS reader application to predict failures caused by three representative types of faults: memory leak, system overload, and sudden node crash. We compare HORA to a monolithic approach and the results show that our approach can improve the area under the ROC curve by 9.9%. (C) 2017 The Authors. Published by Elsevier Inc.
引用
收藏
页码:669 / 685
页数:17
相关论文
共 50 条
  • [31] Controlled systems, failure prediction and maintenance
    Langeron, Yves
    Fouladirad, Mitra
    Grall, Antoine
    IFAC PAPERSONLINE, 2016, 49 (12): : 805 - 808
  • [32] Drop-Shock Failure Prediction in Electronic Packages by Using Peridynamic Theory
    Agwai, Abigail
    Guven, Ibrahim
    Madenci, Erdogan
    IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2012, 2 (03): : 439 - 447
  • [33] Failure prediction and simulation of the starting system
    Dziubinski, Mieczyslaw
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON RELIABILITY SYSTEMS ENGINEERING (ICRSE 2017), 2017,
  • [34] A new architecture for online error detection and isolation in network on chip
    Nehnouh, Chakib
    JOURNAL OF HIGH SPEED NETWORKS, 2020, 26 (04) : 307 - 323
  • [35] Process-aware FMEA framework for failure analysis in maintenance
    Battirola Filho, Julio Cesar
    Piechnicki, Flavio
    Rocha Loures, Eduardo de Freitas
    Portela Santos, Eduardo Alves
    JOURNAL OF MANUFACTURING TECHNOLOGY MANAGEMENT, 2017, 28 (06) : 822 - 848
  • [36] A Failure Prediction Strategy for Transistor Aging
    Yi, Hyunbean
    Yoneda, Tomokazu
    Inoue, Michiko
    Sato, Yasuo
    Kajihara, Seiji
    Fujiwara, Hideo
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (11) : 1951 - 1959
  • [37] Satellite Lifetime Prediction with Random Failure
    Zhao, Haitao
    Yang, Hui
    Xiong, Xiao
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON RELIABILITY, MAINTAINABILITY AND SAFETY (ICRMS'2016): INTEGRATING BIG DATA, IMPROVING RELIABILITY & SERVING PERSONALIZATION, 2016,
  • [38] GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability
    Lee H.
    Al Faruque M.A.
    IEEE Transactions on Multi-Scale Computing Systems, 2017, 3 (02): : 86 - 99
  • [39] Advance Prediction Method of Failure Consequence for Natural Gas Pipeline Soil Corrosion Leakage
    An, Jinyu
    Liu, Peng
    JOURNAL OF FAILURE ANALYSIS AND PREVENTION, 2021, 21 (06) : 2202 - 2214
  • [40] Failure prediction by relevance vector regression with improved quantum-inspired gravitational search
    Lou, Jungang
    Jiang, Yunliang
    Shen, Qing
    Wang, Ruiqin
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2018, 103 : 171 - 177