Data-driven mixed-Integer linear programming-based optimisation for efficient failure detection in large-scale distributed systems

被引:4
作者
Er-Rahmadi, Btissam [1 ,3 ]
Ma, Tiejun [1 ,2 ]
机构
[1] Univ Southampton, Southampton Business Sch, Dept Decis Analyt & Risk, Ctr Risk Res, Bldg 2,Univ Rd, Southampton SO17 1BJ, Hants, England
[2] Univ Edinburgh, Artificial Intelligence Applicat Inst, Sch Informat Informat Forum, Crichton St, Edinburgh EH8 9AB, Midlothian, Scotland
[3] Edinburgh Res Ctr, Huawei Technol R&D, 2 Semple St, Edinburgh EH3 8BL, Midlothian, Scotland
关键词
Nonlinear programming; Mixed integer linear programming; Distributed systems; Failure detection; Heartbeats; RESOURCE-MANAGEMENT; NETWORK; SERVICE; EXPLORATION; FRAMEWORK; CONSENSUS; QUALITY; MODELS;
D O I
10.1016/j.ejor.2022.02.006
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Failure detectors (FDs) are fundamental building blocks for distributed systems. An FD detects whether a process has crashed or not based on the reception of heartbeats' messages sent by this process over a communication channel. A key challenge of FDs is to tune their parameters to achieve optimal performance which satisfies the desired system requirements. This is challenging due to the complexities of large-scale networks. Existing FDs ignore such optimisation and adopt ad-hoc parameters. In this paper, we propose a new Mixed Integer Linear Programming (MILP) optimisation-based FD algorithm. We obtain the MILP formulation via piecewise linearisation relaxations. The MILP involves obtaining optimal FD parameters that meet the optimal trade-off between its performance metrics requirements, network conditions and system parameters. The MILP maximises our FD's accuracy under bounded failure detection time while considering network and system conditions as constraints. The MILP's solution represents optimised FD parameters that maximise FD's expected performance. To adapt to real-time network changes, our proposed MILP-based FD fits the probability distribution of heartbeats' inter-arrivals. To address our FD scalability challenge in large-scale systems where the MILP model needs to compute approximate optimal solutions quickly, we also propose a heuristic algorithm. To test our proposed approach, we adopt Amazon Cloud as a realistic testing environment and develop a simulator for robustness tests. Our results show consistent improvement of overall FD performance and scalability. To the best of our knowledge, this is the first attempt to combine the MILP-based optimisation modelling with FD to achieve performance guarantees.
引用
收藏
页码:337 / 353
页数:17
相关论文
共 83 条
  • [61] Computing tight bounds via piecewise linear functions through the example of circle cutting problems
    Rebennack, Steffen
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2016, 84 (01) : 3 - 57
  • [62] Combining sampling-based and scenario-based nested Benders decomposition methods: application to stochastic dual dynamic programming
    Rebennack, Steffen
    [J]. MATHEMATICAL PROGRAMMING, 2016, 156 (1-2) : 343 - 389
  • [63] Multiobjective Reinforcement Learning for Cognitive Satellite Communications Using Deep Neural Network Ensembles
    Rodrigues Ferreira, Paulo Victor
    Paffenroth, Randy
    Wyglinski, Alexander M.
    Hackett, Timothy M.
    Bilen, Sven G.
    Reinhart, Richard C.
    Mortensen, Dale J.
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2018, 36 (05) : 1030 - 1041
  • [64] Ross S. M., 1996, Stochastic Processes
  • [65] Satzger Benjamin, 2011, International Journal of Automomous and Adaptive Communications Systems, V4, P61, DOI 10.1504/IJAACS.2011.037749
  • [66] Satzger B, 2007, APPLIED COMPUTING 2007, VOL 1 AND 2, P551, DOI 10.1145/1244002.1244129
  • [67] Stochastic Modeling and Approaches for Managing Energy Footprints in Cloud Computing Service
    Shen, Siqian
    Wang, Jue
    [J]. SERVICE SCIENCE, 2014, 6 (01) : 15 - 33
  • [68] Maintaining Secure and Reliable Distributed Control Systems
    Sleptchenko, Andrei
    Johnson, M. Eric
    [J]. INFORMS JOURNAL ON COMPUTING, 2015, 27 (01) : 103 - 117
  • [69] TABLE FOR ESTIMATING THE GOODNESS OF FIT OF EMPIRICAL DISTRIBUTIONS
    SMIRNOV, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (02): : 279 - 279
  • [70] ADAPTATION - Algorithms to ADAPTive FAulT MonItOriNg and their implementation on CORBA
    Sotoma, I
    Madeira, ERM
    [J]. DOA'01: 3RD INTERNATIONAL SYMPOSIUM ON DISTRIBUTED OBJECTS & APPLICATIONS, PROCEEDINGS, 2001, : 219 - 228