Utilizing dynamically coupled cores to form a resilient chip multiprocessor

被引:71
作者
LaFrieda, Christopher [1 ]
Ipek, Engin [1 ]
Martinez, Jose F. [1 ]
Manohar, Rajit [1 ]
机构
[1] Cornell Univ, Comp Syst Lab, Ithaca, NY 14853 USA
来源
37TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS | 2007年
关键词
D O I
10.1109/DSN.2007.100
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Aggressive CMOS scaling will make future chip multiprocessors (CMPs) increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Existing fault-tolerant CMP proposals that implement dual modular redundancy (DMR) do so by statically binding pairs of adjacent cores via dedicated communication channels and buffers. This can result in unnecessary power and performance losses in cases where one core is defective (in which case the entire DMR pair must be disabled), or when cores exhibit different frequency leakage characteristics due to process variations (in which case the pair runs at the speed of the slowest core). Static DMR also hinders power den sity/thermal management, as DMR pairs running code with similar power/thermal characteristics are necessarily placed next to each other on the die. We present dynamic core coupling (DCC), an architectural technique that allows arbitrary CMP cores to verify each other's execution while requiring no static core binding at design time or dedicated communication hardware. Our evaluation shows that the performance overhead of DCC over a CMP without fault tolerance is 3% on SPEC2000 benchmarks, and is within 5% for a set of scalable parallel scientific and data mining applications with lip to eight threads (16 processors). Our results also show that DCC has the potential to significantly outperform existing static DMR schemes.
引用
收藏
页码:317 / +
页数:2
相关论文
共 27 条
[1]  
*ADV MICR DEV, 2005, AMD64 ARCH PROGR MAN, V2
[2]   PARALLEL CRC GENERATION [J].
ALBERTENGO, G ;
SISTO, R .
IEEE MICRO, 1990, 10 (05) :63-71
[3]  
BORKAR S, 2005, TECHNOLOGY INTEL MAR
[4]  
BORKAR S, 2003, DES AUT C JUN
[5]   Trends and challenges in VLSI circuit reliability [J].
Constantinescu, C .
IEEE MICRO, 2003, 23 (04) :14-19
[6]  
CONSTANTINIDES K, 2006, INT S HIGH PERF COMP
[7]  
GOMAA M, 2003, INT S COMP ARCH JUN
[8]   SPEC CPU2000: Measuring CPU performance in the new millennium [J].
Henning, JL .
COMPUTER, 2000, 33 (07) :28-+
[9]  
KIRMAN M, 2005, INT S MICR DEC
[10]  
KLEINOSOWSKI A, 2002, IEEE COMPUTER ARCHIT, V1