Evaluating reliability improvements of fault tolerant array processors using algorithm-based fault tolerance

被引:1
作者
Tao, DL
Kantawala, K
机构
[1] Department of Electrical Engineering, College of Engineering and Applied Science, State University of New York, Stony Brook
关键词
array processors; error detecting/correcting codes; fault tolerance; multiprocessor system; reliability;
D O I
10.1109/12.600889
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Algorithm-based fault tolerance (ABFT) is used to provide low-cost error protection for VLSI processor arrays used in real-time digital signal processing. The main objective of incorporating an ABFT technique in a processor array is to improve its reliability. All previous approaches on ABFT are evaluated in terms of their error detecting/correcting capabilities, the reliability improvement has never been addressed. In this paper, we develop a stochastic model for an array processor incorporating ABFT that takes the behavior of transient/intermittent failures and hardware overhead into account. This model is then used to evaluate reliability and reliability improvements of several existing ABFT techniques that tolerate single faults. Therefore, a user can evaluate a number of ABFT techniques and make a trade-off between reliability and cost prior to the implementation. Moreover, we have conducted extensive simulation experiments and the simulation results validate the proposed model.
引用
收藏
页码:725 / 730
页数:6
相关论文
共 23 条