A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

被引:54
作者
Mittal, Sparsh [1 ]
Vetter, Jeffrey S. [1 ,2 ]
机构
[1] Oak Ridge Natl Lab, Future Technol Grp, Oak Ridge, TN 37830 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
Review; classification; reliability; resilience; fault-tolerance; vulnerability; architectural vulnerability factor; soft/transient error; architectural techniques; ARCHITECTURAL VULNERABILITY FACTOR; ERROR PROTECTION; CACHE; PERFORMANCE; MEMORY; ENERGY; ENHANCEMENT; FAILURES; REDUCE; COST;
D O I
10.1109/TPDS.2015.2426179
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made 'reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. This paper provides a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based on their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. We believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.
引用
收藏
页码:1226 / 1238
页数:13
相关论文
共 114 条
[1]  
Amrouch H., 2011, Proceedings of the 24th International Conference on VLSI Design: concurrently with the 10th International Conference on Embedded Systems Design, P189, DOI 10.1109/VLSID.2011.68
[2]  
[Anonymous], P WORKSH INT COMP CO
[3]  
[Anonymous], P INT C HIGH PERF CO
[4]  
[Anonymous], P WORKSH MEM PERF DE
[5]  
[Anonymous], 2012, PROC IEEE INT C HIGH
[6]  
[Anonymous], P WORKSH QUAL AW DES
[7]  
[Anonymous], P 8 IEEE WORKSH SIL
[8]  
[Anonymous], 2004, SER-History, Trends and Challenges:A Guide for Designing With Memory ICs
[9]   Balancing performance and reliability in the memory hierarchy [J].
Asadi, GH ;
Sridharan, V ;
Tahoori, MB ;
Kaeli, D .
ISPASS 2005: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2005, :269-279
[10]  
Awasthi M., 2012, PROC INT S HIGH PERF, P1