Microcontroller Compiler-Assisted Software Fault Tolerance

被引:24
作者
Bohman, Matthew [1 ,2 ]
James, Benjamin [1 ,2 ]
Wirthlin, Michael J. [1 ,2 ]
Quinn, Heather [3 ]
Goeders, Jeffrey [1 ,2 ]
机构
[1] Brigham Young Univ, Dept Elect & Comp Engn, Provo, UT 84602 USA
[2] NSF Ctr Space High Performance & Resilient Comp, Provo, UT 84602 USA
[3] Los Alamos Natl Lab, ISR 3 Space Data Syst, Los Alamos, NM 87545 USA
基金
美国国家科学基金会;
关键词
Silent data corruption (SDC); single-event upset (SEU); soft errors; software fault tolerance;
D O I
10.1109/TNS.2018.2886094
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Commercial off-the-shelf microcontrollers can be useful for noncritical processing on spaceborne platforms. These microprocessors can be inexpensive and consume small amounts of power. However, the software running on these processors is vulnerable to radiation upsets. In this paper, we present a fully automated, configurable, software-based tool to increase the reliability of microprocessors in high-radiation environments. This tool consists of a set of open-source LLVM compiler passes to automatically implement software-based mitigation techniques. We duplicate or triplicate computations and insert voting mechanisms into software during the compilation process, allowing for runtime error correction. While the techniques we implement are not novel, previous work has typically been closed source, processor architecture dependent, not automated, and not tested in real high-radiation environments. In contrast, the compiler passes presented in this paper are publicly available, highly customizable, and are platform independent and language independent. We have tested our modified software using both fault injection and through neutron beam radiation on a Texas Instruments MSP430 microcontroller. When tested by a neutron beam, we were able to decrease the cross section of programs by 17-29x, increasing mean-work-to-failure by 4-7x.
引用
收藏
页码:223 / 232
页数:10
相关论文
共 50 条
  • [41] A survey of linguistic structures for application-level fault tolerance
    De Florio, Vincenzo
    Blondia, Chris
    ACM COMPUTING SURVEYS, 2008, 40 (02)
  • [42] ERrOR: Improving Performance and Fault Tolerance using Early Execution
    Choudhary, Raj Kumar
    Patel, Janeel
    Singh, Virendra
    2023 IEEE 29TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN, IOLTS, 2023,
  • [43] Relative Performance of Multipliers: A Fault Tolerance Perspective for Parallel FFTs
    Inala, Sai Satish
    Pushpalatha, P.
    2017 6TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2017, : 194 - 198
  • [44] ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
    Yu, Jing
    Garzaran, Maria Jesus
    Snir, Marc
    CGO 2009: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, PROCEEDINGS, 2009, : 35 - +
  • [45] Adaptive ILP Control to increase Fault Tolerance for VLIW Processors
    Sartor, Anderson L.
    Wong, Stephan
    Beck, Antonio C. S.
    2016 IEEE 27TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2016, : 9 - 16
  • [46] Fault tolerance design in JPEG 2000 image compression system
    Nguyen, C
    Redinbo, GR
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2005, 2 (01) : 57 - 75
  • [47] Harnessing Soft Computations for Low-budget Fault Tolerance
    Khudia, Daya Shanker
    Mahlke, Scott
    2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 319 - 330
  • [48] Low-Cost Memory Fault Tolerance for IoT Devices
    Gottscho, Mark
    Alam, Irina
    Schoeny, Clayton
    Dolecek, Lara
    Gupta, Puneet
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [49] A Dependent Model for Fault Tolerant Software Systems During Debugging
    Wang, Rong-Tsorng
    IEEE TRANSACTIONS ON RELIABILITY, 2012, 61 (02) : 504 - 515
  • [50] Behavioral analysis of a fault-tolerant software system with rejuvenation
    Rinsaka, K
    Dohi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (12): : 2681 - 2690