On the design of a high-performance adaptive router for CC-NUMA multiprocessors

被引:2
作者
Puente, V
Gregorio, JA
Beivide, R
Izu, C
机构
[1] Univ Cantabria, E-39005 Santander, Cantabria, Spain
[2] Univ Adelaide, Dept Comp Sci, Adelaide, SA 5005, Australia
关键词
interconnection networks; adaptive routing; hardware router design; shared memory multiprocessors;
D O I
10.1109/TPDS.2003.1199066
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade off between network performance and hardware cost. The outcome of this research is a High-Performance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper,includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and,real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10 percent to 40 percent in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83 percent of its theoretical maximum throughput under random traffic and up to 70 percent when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.
引用
收藏
页码:487 / 501
页数:15
相关论文
共 29 条
[1]  
ADIGA NR, 2002, P SUP 2002 C NOV
[3]  
[Anonymous], ALPHA 21364 SCALABLE
[4]  
BARROSO LA, 2000, P 6 INT S HIGH PERF, P3
[5]  
CARRION C, 1997, P INT C HIGH PERF CO, P50
[6]   PERFORMANCE ANALYSIS OF K-ARY N-CUBE INTERCONNECTION NETWORKS [J].
DALLY, WJ .
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (06) :775-785
[7]   VIRTUAL-CHANNEL FLOW-CONTROL [J].
DALLY, WJ .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1992, 3 (02) :194-205
[8]   THE TORUS ROUTING CHIP [J].
DALLY, WJ ;
SEITZ, CL .
DISTRIBUTED COMPUTING, 1986, 1 (04) :187-196
[9]   A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks [J].
Duato, J .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (08) :841-854
[10]   Spider: A high-speed network interconnect [J].
Galles, M .
IEEE MICRO, 1997, 17 (01) :34-39