An Efficient Network-on-Chip Router for Dataflow Architecture

被引:12
作者
Shen, Xiao-Wei [1 ,2 ]
Ye, Xiao-Chun [1 ]
Tan, Xu [1 ,2 ]
Wang, Da [1 ]
Zhang, Lunkai [3 ]
Li, Wen-Ming [1 ]
Zhang, Zhi-Min [1 ]
Fan, Dong-Rui [1 ]
Sun, Ning-Hui [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100049, Peoples R China
[3] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
基金
中国国家自然科学基金;
关键词
multi-destination; router; network-on-chip; dataflow architecture; high-performance computing; PERFORMANCE;
D O I
10.1007/s11390-017-1703-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
引用
收藏
页码:11 / 25
页数:15
相关论文
共 40 条
[1]  
Agrawal M, 2014, IEEE VLSI TEST SYMP
[2]   Heterogeneous NoC Router Architecture [J].
Ben-Itzhak, Yaniv ;
Cidon, Israel ;
Kolodny, Avinoam ;
Shabun, Michael ;
Shmuel, Nir .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) :2479-2492
[3]   Scaling to the end of silicon with EDGE architectures [J].
Burger, D ;
Keckler, SW ;
McKinley, KS ;
Dahlin, M ;
John, LK ;
Lin, C ;
Moore, CR ;
Burrill, J ;
McDonald, RG ;
Yoder, R .
COMPUTER, 2004, 37 (07) :44-+
[4]  
Carter NP, 2013, INT S HIGH PERF COMP, P198, DOI 10.1109/HPCA.2013.6522319
[5]  
Chang YY, 2013, INT S HIGH PERF COMP, P390, DOI 10.1109/HPCA.2013.6522335
[6]   DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].
Chen, Tianshi ;
Du, Zidong ;
Sun, Ninghui ;
Wang, Jia ;
Wu, Chengyong ;
Chen, Yunji ;
Temam, Olivier .
ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283
[7]  
Deng Z., 2015, Proceedings of the 2015 International Symposium on Memory Systems, P247
[8]   SCALING REVERSE TIME MIGRATION PERFORMANCE THROUGH RECONFIGURABLE DATAFLOW ENGINES [J].
Fu, Haohuan ;
Gan, Lin ;
Clapp, Robert G. ;
Ruan, Huabin ;
Pell, Oliver ;
Mencer, Oskar ;
Flynn, Michael ;
Huang, Xiaomeng ;
Yang, Guangwen .
IEEE MICRO, 2014, 34 (01) :30-40
[9]  
Hesse R, 2012, P 6 IEEE ACM INT S N
[10]  
Kamali M., 2011, Proceedings of the 2011 14th Euromicro Conference on Digital System Design. Architectures, Methods and Tools. (DSD 2011), P634, DOI 10.1109/DSD.2011.86