Thread Relocation: A Runtime Architecture for Tolerating Hard Errors in Chip Multiprocessors

被引:12
作者
Khan, Omer [1 ]
Kundu, Sandip [2 ]
机构
[1] Univ Massachusetts Amherst, Framingham, MA 01701 USA
[2] Univ Massachusetts Amherst, Amherst, MA 01002 USA
基金
美国国家科学基金会;
关键词
Chip multiprocessor (CMP); hard-error tolerance; hardware/software codesign; hypervisor; virtualization; COMPONENTS;
D O I
10.1109/TC.2009.76
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the semiconductor industry continues its relentless push for nano-CMOS technologies, device reliability and occurrence of hard errors have emerged as a dominant concern in multicores. Although regular memory structures are protected against hard errors using error correcting codes or spare rows and columns, many of the structures within the cores are left unprotected. Even if the location of hard errors is known a priori, disabling faulty cores results in a substantial performance loss. Several proposed techniques use microarchitectural redundancy to allow defective cores to continue operation. These techniques are attractive, but limited due to either added cost of additional redundancy that offers no benefits to an error-free core, or limited coverage, due to the natural redundancy offered by the microarchitecture. We propose to exploit the intercore redundancy in chip multiprocessors for hard-error tolerance. Our scheme combines hardware reconfiguration to ensure reduced functionality of cores, and a runtime layer of software (microvisor) to manage mapping of threads to cores. Microvisor observes the changing phase behavior of threads and initiates thread relocation to match the computational demands of threads to the capabilities of cores. Our results show that in the presence of degraded cores, microvisor mitigates performance losses by an average of two percent.
引用
收藏
页码:651 / 665
页数:15
相关论文
共 36 条
  • [1] [Anonymous], P INT C DES AUT TEST
  • [2] [Anonymous], P INT REL PHYS S
  • [3] [Anonymous], 2001, P WORKSH COMPL EFF D
  • [4] [Anonymous], P INT REL PHYS S
  • [5] [Anonymous], INT TURB BOOST TECHN
  • [6] [Anonymous], UCSDCS99630
  • [7] [Anonymous], P INT C PAR ARCH COM
  • [8] [Anonymous], P WORKSH SIL ERR LOG
  • [9] [Anonymous], P INT C COMP DES
  • [10] [Anonymous], P INT S COMP ARCH