Reduced-precision Algorithm-based Fault Tolerance for FPGA-implemented Accelerators

被引：0

作者：

Davis, James J. ^{[1
]}

Cheung, Peter Y. K. ^{[1
]}

机构：

[1] Imperial Coll London, London SW7 2AZ, England

来源：

APPLIED RECONFIGURABLE COMPUTING, ARC 2016 | 2016年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1007/978-3-319-30481-6_31

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry. In this paper, we present our research into the application of algorithm-based fault tolerance (ABFT) in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementation of a novel checksum truncation technique, analysing its effects upon overheads and allowed error. Our findings include that bit-width reduction of ABFT circuitry within a fault-tolerant accelerator used for multiplying pairs of 32 x 32 matrices resulted in the reduction of incurred area overhead by 16.7% and recovery of 8.27% of timing model frnax. These came at the cost of introducing average and maximum absolute output errors of 0.430% and 0.927%, respectively, of the maximum absolute output value under transient fault injection.

引用

页码：361 / 368

页数：8

共 50 条

[1] FPGA-implemented CRC algorithm
Anton, Constantin
Ionescu, Laurentiu
Tutanescu, Ion
Mazare, Alin
Serban, Gheorghe
2009 APPLIED ELECTRONICS, INTERNATIONAL CONFERENCE, 2009, : 25 - 29
[2] Algorithm-based fault tolerance for discrete wavelet transform implemented on GPUs
Bao, Chong
Zhang, Shancong
JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 108
[3] FPGA-implemented carrier based SPWM multilevel controller
Ebersohn, G
Gitau, MN
2004 IEEE AFRICON: 7TH AFRICON CONFERENCE IN AFRICA, VOLS 1 AND 2: TECHNOLOGY INNOVATION, 2004, : 1175 - 1178
[4] Algorithm-based fault tolerance: a review
Vijay, M
Mittal, R
MICROPROCESSORS AND MICROSYSTEMS, 1997, 21 (03) : 151 - 161
[5] A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA
Parravicini, Alberto
Sgherzi, Francesco
Santambrogio, Marco D.
2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 378 - 383
[6] ALGORITHM-BASED FAULT TOLERANCE FOR MATRIX OPERATIONS
HUANG, KH
ABRAHAM, JA
IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (06) : 518 - 528
[7] AN ANALYSIS OF ALGORITHM-BASED FAULT TOLERANCE TECHNIQUES
LUK, FT
PARK, H
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1988, 5 (02) : 172 - 184
[8] ALGORITHM-BASED FAULT TOLERANCE ON A HYPERCUBE MULTIPROCESSOR
BANERJEE, P
RAHMEH, JT
STUNKEL, C
NAIR, VS
ROY, K
BALASUBRAMANIAN, V
ABRAHAM, JA
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (09) : 1132 - 1145
[9] Algorithm-Based Fault Tolerance for Parallel Stencil Computations
Cavelan, Aurelien
Ciorba, Florina M.
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 12 - 22
[10] Algorithm-based Fault Tolerance for Dense Matrix Factorizations
Du, Peng
Bouteiller, Aurelien
Bosilca, George
Herault, Thomas
Dongarra, Jack
ACM SIGPLAN NOTICES, 2012, 47 (08) : 225 - 234

← 1 2 3 4 5 →