A Dependable Coarse-grain Reconfigurable Multicore Array

被引:0
作者
Smaragdos, Georgios [1 ]
Khan, Danish Anis [2 ]
Sourdis, Ioannis [2 ]
Strydis, Christos [1 ]
Malek, Alirad [2 ]
Tzilis, Stavros [2 ]
机构
[1] Erasmus Univ, Med Ctr, Neurosci Dept, Rotterdam, Netherlands
[2] Chalmers, Comp Sci & Engn Dept, Gothenburg, Sweden
来源
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) | 2014年
关键词
TOLERANCE;
D O I
10.1109/IPDPSW.2014.20
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent trends in semiconductor technology have dictated the constant reduction of device size. One negative effect stemming from the reduction in size and increased complexity is the reduced device reliability. This paper is centered around the matter of permanent fault tolerance and graceful system degradation in the presence of permanent faults. We take advantage of the natural redundancy of homogeneous multicores following a sparing strategy to reuse functional pipeline stages of faulty cores. This is done by incorporating reconfigurable interconnects next to which the cores of the system are placed, providing the flexibility to redirect the data-flow from the faulty pipeline stages of damaged cores to spare (still) functional ones. Several micro-architectural changes are introduced to decouple the processor stages and allow them to be interchangeable. The proposed approach is a clear departure from previous ones by offering full flexibility as well as highly graceful performance degradation at reasonable costs. More specifically, our coarse-grain fault-tolerant multicore array provides up to x4 better availability compared to a conventional multicore and up to x2 higher probability to deliver at least one functioning core in high fault densities. For our benchmarks, our design (synthesized for STM 65nm SP technology) incurs a total execution-time overhead for the complete system ranging from x1.37 to x3.3 compared to a (baseline) non-fault-tolerant system, depending on the permanent-fault density. The area overhead is 19.5% and the energy consumption, without incorporating any power/energy-saving technique, is estimated on average to be 20.9% higher compared to the baseline, unprotected design.
引用
收藏
页码:141 / 150
页数:10
相关论文
共 14 条
  • [1] Designing reliable systems from unreliable components: The challenges of transistor variability and degradation
    Borkar, S
    [J]. IEEE MICRO, 2005, 25 (06) : 10 - 16
  • [2] Christou A., 1994, ELECTROMIGRATION ELE
  • [3] StageNet: A Reconfigurable Fabric for Constructing Dependable CMPs
    Gupta, Shantanu
    Feng, Shuguang
    Ansari, Amin
    Mahlke, Scott
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (01) : 5 - 19
  • [4] Hauck S., 2007, RECONFIGURABLE COMPU, P834
  • [5] Pan A, 2009, DES AUT TEST EUROPE, P490
  • [6] Pellegrini A, 2012, CONF PROC INT SYMP C, P344, DOI 10.1109/ISCA.2012.6237030
  • [7] Powell MD, 2009, CONF PROC INT SYMP C, P93, DOI 10.1145/1555815.1555769
  • [8] Riemens D. P., 2010, THESIS
  • [9] Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults
    Romanescu, Bogdan F.
    Sorin, Daniel J.
    [J]. PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 43 - 51
  • [10] Seepers R., 2012, INT C SAMOS 2012 JUL