On Graceful Degradation of Chip Multiprocessors in Presence of Faults via Flexible Pooling of Critical Execution Units

被引:0
作者
Rodrigues, Rance [1 ]
Kundu, Sandip [1 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
来源
2011 IEEE 17TH INTERNATIONAL ON-LINE TESTING SYMPOSIUM (IOLTS) | 2011年
关键词
Reliability; fault tolerance; dynamic hardware sharing; critical instruction execution unit; performance impact; RELIABILITY;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves sharing critical execution resources among cores to survive faults. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit. Outsourcing ALU operations incur large performance penalty and better solutions need to be in place to ensure survivability with minimal performance loss. We propose the provisioning of a shared ALU among a set of cores that can act as a spare for any constituent core in the group. This solution works well for single ALU failures, but leads to resource contention when multiple ALUs fail. Simulation case studies on MediaBench and MiBench benchmarks show that the proposed solution allows the CMP to remain functionally intact with no performance penalty for single ALU failures and no more than 1.5% performance loss on average for failure of single ALU in each core.
引用
收藏
页数:6
相关论文
共 28 条
[1]  
[Anonymous], ON LIN TEST S 2007 I
[2]  
[Anonymous], COMP DES 2003 P 21 I
[3]  
[Anonymous], INT S COMP ARCH ISCA
[4]  
[Anonymous], DEP SYST NETW DSN 19
[5]  
[Anonymous], WORKSH SYST EFF LOG
[6]  
[Anonymous], 03024377ATR INT SEMA
[7]  
[Anonymous], HPCRI 2005 WORKSH CO
[8]  
[Anonymous], DES AUT C 1999 P 36
[9]  
[Anonymous], RELIABLE COMPUTER SY
[10]  
[Anonymous], IEEE T COMPUTERS