Application-level fault tolerance as a complement to system-level fault tolerance

被引:14
|
作者
Haines, J [1 ]
Lakamraju, V [1 ]
Koren, I [1 ]
Krishna, CM [1 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
关键词
distributed real-time systems; fault tolerance; checkpointing; imprecise computation; target tracking; beam forming;
D O I
10.1023/A:1008181429693
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.
引用
收藏
页码:53 / 68
页数:16
相关论文
共 50 条
  • [41] Factorizing fault tolerance
    Prasetya, ISWB
    Swierstra, SD
    THEORETICAL COMPUTER SCIENCE, 2003, 290 (02) : 1201 - 1222
  • [42] A Two-Level Fault-Tolerance Technique for High Performance Computing Applications
    Aseeri, Aishah M.
    Fadel, Mai A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (12) : 46 - 54
  • [43] DRIFT: Decoupled CompileR-Based Instruction-Level Fault-Tolerance
    Mitropoulou, Konstantina
    Porpodas, Vasileios
    Cintra, Marcelo
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013, 2014, 8664 : 217 - 233
  • [44] Practical QoS network system with fault tolerance
    Lee, SS
    Das, S
    Yu, H
    Yamada, K
    Pau, G
    Gerla, M
    COMPUTER COMMUNICATIONS, 2003, 26 (15) : 1764 - 1774
  • [45] Multi-level parallel strategy and fault tolerance design for THAFTS-Acoustic
    Lü X.-J.
    Zou M.-S.
    Liu Z.
    Xu J.-X.
    Leng W.-H.
    Chuan Bo Li Xue/Journal of Ship Mechanics, 2023, 27 (11): : 1729 - 1736
  • [46] Partial syndrome-based system-level fault diagnosis using game theory
    Elhadef, Mourad
    Grira, Sofiane
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2018, 33 (01) : 69 - 86
  • [47] Efficient and fault-tolerant distributed host monitoring using system-level diagnosis
    Bearden, M
    Bianchini, R
    DISTRIBUTED PLATFORMS, 1996, : 159 - 172
  • [48] Trivariate Bernoulli distribution with application to software fault tolerance
    Fiondella, Lance
    Zeephongsekul, Panlop
    ANNALS OF OPERATIONS RESEARCH, 2016, 244 (01) : 241 - 255
  • [49] Trivariate Bernoulli distribution with application to software fault tolerance
    Lance Fiondella
    Panlop Zeephongsekul
    Annals of Operations Research, 2016, 244 : 241 - 255
  • [50] Application of Self-Tuning Control System for Solution of Fault Tolerance Problem
    Vershinin, Yuri A.
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL II, 2012, : 1206 - 1210