A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

被引:109
作者
Kumar, Amit [1 ]
Kundu, Partha [2 ]
Singh, Arvind P. [3 ]
Peh, Li-Shiuan [1 ]
Jha, Niraj K. [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] Intel Corp, Microproc Technol Labs, Santa Clara, CA 95052 USA
[3] Intel Technol India Pvt Ltd, Bangalore 560017, Karnataka, India
来源
2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2 | 2007年
关键词
D O I
10.1109/ICCD.2007.4601881
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65nm technology. Our design targets an aggressive clock frequency of 3.6GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm(2) area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6GHz clock frequency while achieving a peak switching data rate in excess of 4.6Tbits/s per router node.
引用
收藏
页码:63 / +
页数:2
相关论文
共 20 条
[1]   HIGH-SPEED SWITCH SCHEDULING FOR LOCAL-AREA NETWORKS [J].
ANDERSON, TE ;
OWICKI, SS ;
SAXE, JB ;
THACKER, CP .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1993, 11 (04) :319-352
[2]  
[Anonymous], P DES AUT TEST EUR C
[3]  
[Anonymous], 1996, P INT S HIGH PERF IN
[4]   A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 μm2 SRAM cell [J].
Bai, P ;
Auth, C ;
Balakrishnan, S ;
Bost, M ;
Brain, R ;
Chikarmane, V ;
Heussner, R ;
Hussein, M ;
Hwang, J ;
Ingerly, D ;
James, R ;
Jeong, J ;
Kenyon, C ;
Lee, E ;
Lee, SH ;
Lindert, N ;
Liu, M ;
Ma, Z ;
Marieb, T ;
Murthy, A ;
Nagisetty, R ;
Natarajan, S ;
Neirynck, J ;
Ott, A ;
Parker, C ;
Sebastian, J ;
Shaheed, R ;
Sivakurnar, S ;
Steigerwald, J ;
Tyagi, S ;
Weber, C ;
Woolery, B ;
Yeoh, A ;
Zhang, K ;
Bohr, M .
IEEE INTERNATIONAL ELECTRON DEVICES MEETING 2004, TECHNICAL DIGEST, 2004, :657-660
[5]   Networks on chips: A new SoC paradigm [J].
Benini, L ;
De Micheli, G .
COMPUTER, 2002, 35 (01) :70-+
[6]  
BORKAR S, 2005, TECHNOLOGY INTEL NOV
[7]  
Dally William James, 2004, Principles and Practices of Interconnection Networks
[8]  
Dally WJ, 2001, DES AUT CON, P684, DOI 10.1109/DAC.2001.935594
[9]  
Gratz P, 2007, NOCS 2007: FIRST INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, PROCEEDINGS, P7
[10]  
HUGHES C, 2007, P INT S COMP ARCH JU