Cost-Effective Error Detection Through Mersenne Modulo Shadow Datapaths

被引:0
作者
Campbell, Keith [1 ]
Lin, Chen-Hsuan [1 ]
Chen, Deming [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61801 USA
关键词
Cost-effective; error detection; functional unit; gate-level; Mersenne number; modulo arithmetic; reliability; shadow datapath; SINGLE-EVENT UPSET; FAULT-TOLERANCE; ADDERS;
D O I
10.1109/TCAD.2018.2834417
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With technology scaling leading to reliability problems and a proliferation of hardware accelerators, there is a need for cost-effective techniques to detect errors in complex datapaths. Modulo (residue) arithmetic is useful for creating a shadow datapath to check the computation of an arithmetic datapath and involves three key steps: 1) reduction of the inputs to modulo shadow values; 2) computation with those shadow values; and 3) checking the outputs for consistency with the shadow outputs. The focus of this paper is new gate-level architectures and algorithms to reduce the cost of modulo shadow datapaths. We introduce new low-cost architectures for the functional units performing the aforementioned reduction, shadow computation, and checking operations. We compare our functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our functional units have better area and delay than their previous counterparts. To demonstrate the cost-effectiveness of our approach in computation-intensive accelerator applications, we design custom pipelined shadow datapaths for five compound functional units implementing a variety of vector and matrix operations. For a 32-bit main datapath and 2-bit shadow datapath, we observe area costs of 6%-10% and reliability improvements against single event transient errors of 3-61x. For an 8-bit shadow datapath, we observe area costs of 15%-20% and reliability gains of 121-2477x.
引用
收藏
页码:1056 / 1069
页数:14
相关论文
共 26 条
[1]   Characterizing SRAM Single Event Upset in Terms of Single and Multiple Node Charge Collection [J].
Black, J. D. ;
Ball, D. R., II ;
Robinson, W. H. ;
Fleetwood, D. M. ;
Schrimpf, R. D. ;
Reed, R. A. ;
Black, D. A. ;
Warren, K. M. ;
Tipton, A. D. ;
Dodd, P. E. ;
Haddad, N. F. ;
Xapsos, M. A. ;
Kim, H. S. ;
Friendlich, M. .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2008, 55 (06) :2943-2947
[2]   Algorithm-based fault tolerance applied to high performance computing [J].
Bosilca, George ;
Delmas, Remi ;
Dongarra, Jack ;
Langou, Julien .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2009, 69 (04) :410-416
[3]  
Campbell K.A., 2015, Proc. IEEE/ACM Design Automation Conf, P1
[4]  
CHENG E., 2016, DESIGN AUTOMATION C, P68
[5]   Basic mechanisms and modeling of single-event upset in digital microelectronics [J].
Dodd, PE ;
Massengill, LW .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2003, 50 (03) :583-602
[6]  
Eldén L, 2007, FUND ALGORITHMS, V4, pIX, DOI 10.1137/1.9780898718867
[7]   Razor: Circuit-level correction of timing errors for low-power operation [J].
Ernst, D ;
Das, S ;
Lee, S ;
Blaauw, D ;
Austin, T ;
Mudge, T ;
Kim, NS ;
Flautner, K .
IEEE MICRO, 2004, 24 (06) :10-20
[8]  
Ernst D, 2003, 36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, P7
[9]   Single Event Transients in Digital CMOS-A Review [J].
Ferlet-Cavrois, Veronique ;
Massengill, Lloyd W. ;
Gouker, Pascale .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2013, 60 (03) :1767-1790
[10]  
Grochowski E. T., 2003, U. S. Patent, Patent No. [6 625 756, 6625756]