Hora: Architecture-aware online failure prediction

被引:39
|
作者
Pitakrat, Teerat [1 ]
Okanovic, Dusan [1 ]
van Hoorn, Andre [1 ]
Grunske, Lars [2 ]
机构
[1] Univ Stuttgart, Inst Software Technol, Reliable Software Syst, Stuttgart, Germany
[2] Humboldt Univ, Dept Comp Sci, Software Engn, Berlin, Germany
关键词
Online failure prediction; Reliability; Component-based software systems; ERROR PROPAGATION; RELIABILITY; MODEL;
D O I
10.1016/j.jss.2017.02.041
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Complex software systems experience failures at runtime even though a lot of effort is put into the development and operation. Reactive approaches detect these failures after they have occurred and already caused serious consequences. In order to execute proactive actions, the goal of online failure prediction is to detect these failures in advance by monitoring the quality of service or the system events. Current failure prediction approaches look at the system or individual components as a monolith without considering the architecture of the system. They disregard the fact that the failure in one component can propagate through the system and cause problems in other components. In this paper, we propose a hierarchical online failure prediction approach, called HORA, which combines component failure predictors with architectural knowledge. The failure propagation is modeled using Bayesian networks which incorporate both prediction results and component dependencies extracted from the architectural models. Our approach is evaluated using Netflix's server-side distributed RSS reader application to predict failures caused by three representative types of faults: memory leak, system overload, and sudden node crash. We compare HORA to a monolithic approach and the results show that our approach can improve the area under the ROC curve by 9.9%. (C) 2017 The Authors. Published by Elsevier Inc.
引用
收藏
页码:669 / 685
页数:17
相关论文
共 50 条
  • [41] Failure time prediction for vehicle dynamics under performance degradation of dampers and track evolution
    Dai, Xinliang
    Wu, Pingbo
    Shi, Huailong
    Sui, Hao
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART O-JOURNAL OF RISK AND RELIABILITY, 2024, 238 (06) : 1256 - 1270
  • [42] HANA: A Human-Aware Negotiation Architecture
    Fabregues, Angela
    Sierra, Caries
    DECISION SUPPORT SYSTEMS, 2014, 60 : 18 - 28
  • [43] Towards Communication Profile, Topology and Node Failure aware Process Placement
    Vardas, Ioannis
    Ploumidis, Manolis
    Marazakis, Manolis
    2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, : 241 - 248
  • [44] Koko: an architecture for affect-aware games
    Sollenberger, Derek J.
    Singh, Munindar P.
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2012, 24 (02) : 255 - 286
  • [45] Joint Online RUL Prediction for Multivariate Deteriorating Systems
    Peng, Weiwen
    Ye, Zhi-Sheng
    Chen, Nan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (05) : 2870 - 2878
  • [46] Prediction of Atomic Web Services Reliability for QoS-Aware Recommendation
    Silic, Marin
    Delac, Goran
    Srbljic, Sinisa
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2015, 8 (03) : 425 - 438
  • [47] Application of Machine Learning for Dragline Failure Prediction
    Taghizadeh, Amir
    Demirel, Nuray
    1ST SCIENTIFIC PRACTICAL CONFERENCE INTERNATIONAL INNOVATIVE MINING SYMPOSIUM (IN MEMORY OF PROF. VLADIMIR PRONOZA), 2017, 15
  • [48] Neural network approach for failure rate prediction
    Kutylowska, Malgorzata
    ENGINEERING FAILURE ANALYSIS, 2015, 47 : 41 - 48
  • [49] FAILURE PREDICTION OF UNDERGROUND DISTRIBUTION FEEDER CABLES
    BUCCI, RM
    REBBAPRAGADA, RV
    MCELROY, AJ
    CHEBLI, EA
    DRILLER, S
    IEEE TRANSACTIONS ON POWER DELIVERY, 1994, 9 (04) : 1943 - 1955
  • [50] Power Consumption Prediction and Power-Aware Packing in Consolidated Environments
    Choi, Jeonghwan
    Govindan, Sriram
    Jeong, Jinkyu
    Urgaonkar, Bhuvan
    Sivasubramaniam, Anand
    IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (12) : 1640 - 1654