Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators

被引:0
|
作者
Davis, James J. [1 ]
Cheung, Peter Y. K. [1 ]
机构
[1] Imperial Coll London, London SW7 2AZ, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1007/978-3-319-30481-6_31
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry. In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 x 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model frnax. These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.
引用
收藏
页码:361 / 368
页数:8
相关论文
共 50 条
  • [1] FPGA-implemented CRC algorithm
    Anton, Constantin
    Ionescu, Laurentiu
    Tutanescu, Ion
    Mazare, Alin
    Serban, Gheorghe
    2009 APPLIED ELECTRONICS, INTERNATIONAL CONFERENCE, 2009, : 25 - 29
  • [2] Algorithm-based fault tolerance for discrete wavelet transform implemented on GPUs
    Bao, Chong
    Zhang, Shancong
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 108
  • [3] FPGA-implemented carrier based SPWM multilevel controller
    Ebersohn, G
    Gitau, MN
    2004 IEEE AFRICON: 7TH AFRICON CONFERENCE IN AFRICA, VOLS 1 AND 2: TECHNOLOGY INNOVATION, 2004, : 1175 - 1178
  • [4] Algorithm-based fault tolerance: a review
    Vijay, M
    Mittal, R
    MICROPROCESSORS AND MICROSYSTEMS, 1997, 21 (03) : 151 - 161
  • [5] A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA
    Parravicini, Alberto
    Sgherzi, Francesco
    Santambrogio, Marco D.
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 378 - 383
  • [6] ALGORITHM-BASED FAULT TOLERANCE FOR MATRIX OPERATIONS
    HUANG, KH
    ABRAHAM, JA
    IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (06) : 518 - 528
  • [7] AN ANALYSIS OF ALGORITHM-BASED FAULT TOLERANCE TECHNIQUES
    LUK, FT
    PARK, H
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1988, 5 (02) : 172 - 184
  • [8] ALGORITHM-BASED FAULT TOLERANCE ON A HYPERCUBE MULTIPROCESSOR
    BANERJEE, P
    RAHMEH, JT
    STUNKEL, C
    NAIR, VS
    ROY, K
    BALASUBRAMANIAN, V
    ABRAHAM, JA
    IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (09) : 1132 - 1145
  • [9] Algorithm-Based Fault Tolerance for Parallel Stencil Computations
    Cavelan, Aurelien
    Ciorba, Florina M.
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 12 - 22
  • [10] Algorithm-based Fault Tolerance for Dense Matrix Factorizations
    Du, Peng
    Bouteiller, Aurelien
    Bosilca, George
    Herault, Thomas
    Dongarra, Jack
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 225 - 234