Application-level fault tolerance as a complement to system-level fault tolerance

被引:14
|
作者
Haines, J [1 ]
Lakamraju, V [1 ]
Koren, I [1 ]
Krishna, CM [1 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
关键词
distributed real-time systems; fault tolerance; checkpointing; imprecise computation; target tracking; beam forming;
D O I
10.1023/A:1008181429693
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.
引用
收藏
页码:53 / 68
页数:16
相关论文
共 50 条
  • [1] Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance
    Joshua Haines
    Vijay Lakamraju
    Israel Koren
    C. Mani Krishna
    The Journal of Supercomputing, 2000, 16 : 53 - 68
  • [2] Application and System-Level Software Fault Tolerance Through Full System Restarts
    Abdi, Fardin
    Tabish, Rohan
    Rungger, Matthias
    Zamani, Majid
    Caccamo, Marco
    2017 ACM/IEEE 8TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS), 2017, : 197 - 206
  • [3] Bungie: Improving Fault Tolerance via Extensible Application-Level Protocols
    Christie, Samuel H., V
    Chopra, Amit Khushwant
    Singh, Munindar P.
    COMPUTER, 2021, 54 (05) : 44 - 53
  • [4] Application-Level Fault Tolerance in Real-Time Embedded Systems
    Afonso, Francisco
    Silva, Carlos
    Tavares, Adriano
    Montenegro, Sergio
    2008 INTERNATIONAL SYMPOSIUM ON INDUSTRIAL EMBEDDED SYSTEMS, 2008, : 126 - +
  • [5] Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
    Losada, Nuria
    Martin, Maria J.
    Rodriguez, Gabriel
    Gonzalez, Patricia
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2014, 20 (09) : 1352 - 1372
  • [6] A METHOD TO DETERMINE THE LEVEL OF THE INFORMATION SYSTEM FAULT-TOLERANCE
    Boranbayev, A. S.
    Boranbayev, S. N.
    Nurusheva, A. M.
    Seitkulov, Y. N.
    Sissenov, N. M.
    EURASIAN JOURNAL OF MATHEMATICAL AND COMPUTER APPLICATIONS, 2019, 7 (03): : 13 - 32
  • [7] A system-level approach to adaptivity and fault-tolerance in NoC-based MPSoCs: The MADNESS project
    Derin, Onur
    Cannella, Emanuele
    Tuveri, Giuseppe
    Meloni, Paolo
    Stefanov, Todor
    Fiorin, Leandro
    Raffo, Luigi
    Sami, Mariagiovanna
    MICROPROCESSORS AND MICROSYSTEMS, 2013, 37 (6-7) : 515 - 529
  • [8] Instruction-Level Fault Tolerance Configurability
    Demid Borodin
    B. H. H. (Ben) Juurlink
    Said Hamdioui
    Stamatis Vassiliadis
    Journal of Signal Processing Systems, 2009, 57 : 89 - 105
  • [9] Instruction-Level Fault Tolerance Configurability
    Borodin, Demid
    Juurlink, B. H. H.
    Hamdioui, Said
    Vassiliadis, Stamatis
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2009, 57 (01): : 89 - 105
  • [10] System Level Energy Aware Fault Tolerance Approach for Real Time System
    Agrawal, Smriti
    Yadav, Rama Shankar
    Ranvijay
    2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 2246 - 2251