Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy

被引:0
作者
Yung-Chang Chang
Cihun-Siyong Alex Gong
Ching-Te Chiu
机构
[1] National Tsing Hua University,Department of Computer Science
[2] Chang Gung University,Department of Electrical Engineering, College of Engineering
[3] Chang Gung University,Portable Energy System Group, Green Technology Research Center, College of Engineering
[4] Chang Gung Memorial Hospital,Department of Ophthalmology
来源
Journal of Signal Processing Systems | 2020年 / 92卷
关键词
Fault tolerance; Interconnections; Integrated circuit reliability; Network topology;
D O I
暂无
中图分类号
学科分类号
摘要
The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.
引用
收藏
页码:345 / 355
页数:10
相关论文
共 88 条
  • [1] Sodani A(2016)Knights landing: Second-generation intel xeon phi product IEEE Micro 36 34-46
  • [2] Gramunt R(2018)The celerity open-source 511-Core RISC-V tiered accelerator fabric: fast architectures and design methodologies for fast chips IEEE Micro 38 30-41
  • [3] Corbal J(2015)Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34 1537-1557
  • [4] Kim H-S(2009)On-chip networks Synthesis Lectures on Computer Architecture 4 1-141
  • [5] Vinod K(2014)Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors IEEE Transactions on Computers 63 1446-1459
  • [6] Chinthamani S(2003)Trends and challenges in VLSI circuit reliability Micro, IEEE 23 14-19
  • [7] Hutsell S(2011)Modeling and mitigating transient errors in logic circuits IEEE Transactions on Dependable and Secure Computing 8 537-547
  • [8] Agarwal R(2014)A soft error tolerant network-on-chip router pipeline for multi-core systems IEEE Computer Architecture Letters 14 107-110
  • [9] Liu Y-C(2016)An energy-efficient noc router with adaptive fault-tolerance using channel slicing and on-demand tmr IEEE Transactions on Emerging Topics in Computing 6 538-550
  • [10] Davidson S(2010)Self-adaptive system for addressing permanent errors in on-chip interconnects Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 18 527-540