An Efficient Network-on-Chip Router for Dataflow Architecture

被引：12

作者：

Shen, Xiao-Wei ^{[1
,2
]}

Ye, Xiao-Chun ^{[1
]}

Tan, Xu ^{[1
,2
]}

Wang, Da ^{[1
]}

Zhang, Lunkai ^{[3
]}

Li, Wen-Ming ^{[1
]}

Zhang, Zhi-Min ^{[1
]}

Fan, Dong-Rui ^{[1
]}

Sun, Ning-Hui ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100049, Peoples R China

[3] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA

来源：

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY | 2017年 / 32卷 / 01期

基金：

中国国家自然科学基金;

关键词：

multi-destination; router; network-on-chip; dataflow architecture; high-performance computing; PERFORMANCE;

D O I：

10.1007/s11390-017-1703-5

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.

引用

页码：11 / 25

页数：15

共 40 条

[1]

Agrawal M, 2014, IEEE VLSI TEST SYMP

[2] Heterogeneous NoC Router Architecture [J].

Ben-Itzhak, Yaniv ;

Cidon, Israel ;

Kolodny, Avinoam ;

Shabun, Michael ;

Shmuel, Nir .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) :2479-2492

[3] Scaling to the end of silicon with EDGE architectures [J].

Burger, D ;

Keckler, SW ;

McKinley, KS ;

Dahlin, M ;

John, LK ;

Lin, C ;

Moore, CR ;

Burrill, J ;

McDonald, RG ;

Yoder, R .

COMPUTER, 2004, 37 (07) :44-+

[4]

Carter NP, 2013, INT S HIGH PERF COMP, P198, DOI 10.1109/HPCA.2013.6522319

[5]

Chang YY, 2013, INT S HIGH PERF COMP, P390, DOI 10.1109/HPCA.2013.6522335

[6] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].

Chen, Tianshi ;

Du, Zidong ;

Sun, Ninghui ;

Wang, Jia ;

Wu, Chengyong ;

Chen, Yunji ;

Temam, Olivier .

ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283

[7]

Deng Z., 2015, Proceedings of the 2015 International Symposium on Memory Systems, P247

[8] SCALING REVERSE TIME MIGRATION PERFORMANCE THROUGH RECONFIGURABLE DATAFLOW ENGINES [J].

Fu, Haohuan ;

Gan, Lin ;

Clapp, Robert G. ;

Ruan, Huabin ;

Pell, Oliver ;

Mencer, Oskar ;

Flynn, Michael ;

Huang, Xiaomeng ;

Yang, Guangwen .

IEEE MICRO, 2014, 34 (01) :30-40

[9]

Hesse R, 2012, P 6 IEEE ACM INT S N

[10]

Kamali M., 2011, Proceedings of the 2011 14th Euromicro Conference on Digital System Design. Architectures, Methods and Tools. (DSD 2011), P634, DOI 10.1109/DSD.2011.86

← 1 2 3 4 →