Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy

被引:5
作者
Chang, Yung-Chang [1 ]
Gong, Cihun-Siyong Alex [2 ,3 ,4 ]
Chiu, Ching-Te [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Chang Gung Univ, Coll Engn, Dept Elect Engn, Taoyuan, Taiwan
[3] Chang Gung Univ, Coll Engn, Green Technol Res Ctr, Portable Energy Syst Grp, Taoyuan, Taiwan
[4] Chang Gung Mem Hosp, Dept Ophthalmol, Taoyuan, Taiwan
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 04期
关键词
Fault tolerance; Interconnections; Integrated circuit reliability; Network topology; ON-CHIP; NETWORK; DESIGN;
D O I
10.1007/s11265-019-01476-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.
引用
收藏
页码:345 / 355
页数:11
相关论文
共 45 条
[1]  
Abd El Ghany Mohamed A., 2009, 2009 16th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2009), P101, DOI 10.1109/ICECS.2009.5410933
[2]   True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip [J].
Akopyan, Filipp ;
Sawada, Jun ;
Cassidy, Andrew ;
Alvarez-Icaza, Rodrigo ;
Arthur, John ;
Merolla, Paul ;
Imam, Nabil ;
Nakamura, Yutaka ;
Datta, Pallab ;
Nam, Gi-Joon ;
Taba, Brian ;
Beakes, Michael ;
Brezzo, Bernard ;
Kuang, Jente B. ;
Manohar, Rajit ;
Risk, William P. ;
Jackson, Bryan ;
Modha, Dharmendra S. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) :1537-1557
[3]  
[Anonymous], 2015, 2015 19 INT S VLSI D
[4]  
[Anonymous], 2018, IEEE INT J ARXIV
[5]  
[Anonymous], 2011, RELIABILITY AVAILABI
[6]  
[Anonymous], 2015, 2015 INT C CIRC POW
[7]  
Braga M., 2010, 2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010, P4101, DOI 10.1109/ISCAS.2010.5537611
[8]  
Cao Y, 2018, ASIA PAC CONF POSTGR, P1, DOI 10.1109/PRIMEASIA.2018.8597626
[9]   The impact of multiple failure modes on estimating product field reliability [J].
Carulli, JM ;
Anderson, TJ .
IEEE DESIGN & TEST OF COMPUTERS, 2006, 23 (02) :118-126
[10]  
Castro HF, 2013, NEW CIRC SYST C NEWC, P1