A tree-recursive partitioned multicast mechanism for NoC-based deep neural network accelerator

被引:0
作者
Ouyang, Yiming [1 ]
Zhang, Yihe [1 ]
Liang, Huaguo [2 ]
Li, Jianhua [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, 485 Danxia Rd, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Microlectron, 485 Danxia Rd, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Network-on-chips; Deep neural network accelerator; Multicast algorithm; Router architecture; ON-CHIP;
D O I
10.1016/j.mejo.2024.106161
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In chip multiprocessor systems (CMPs), Network on Chip (NoC) has been widely used due to its advantages of favorable reusability, high reliability, and low power consumption. Recently, using NoC platforms to accelerate deep neural networks (DNNs) has become a new trend. This design can enable the intermediate computation results of DNNs to be transmitted within the chip, reducing the number of accesses to off-chip memory. However, a large amount of one-to-many traffic in the DNN accelerator will occupy the system bandwidth, which will significantly reduce the performance of the NoC platform dominated by one-to-one traffic. To address this issue, we propose a tree-based recursive partitioning multicast scheme (TRPM), which increases the path diversity and improves the system bandwidth. We also design a single-cycle per-hop router architecture, which effectively enhances the transmission efficiency of multicast packets. Detailed simulation results show that compared with the latest tree-based multicast algorithm for DNN accelerators, our scheme reduces the number of routed packets by 35%, the classification latency by 13.5% and the average packet latency by 14.5% on average.
引用
收藏
页数:11
相关论文
共 41 条
[1]   A Novel Energy Efficient Multicasting Approach For Mesh NoCs [J].
Arun, M. R. ;
Jisha, P. A. ;
Jose, John .
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS, 2016, 93 :283-291
[2]   Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression [J].
Ascia, Giuseppe ;
Catania, Vincenzo ;
Jose, John ;
Monteleone, Salvatore ;
Palesi, Maurizio ;
Patti, Davide .
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, :54-63
[3]  
Baji T, 2018, 2018 IEEE 2ND ELECTRON DEVICES TECHNOLOGY AND MANUFACTURING CONFERENCE (EDTM 2018), P7, DOI 10.1109/EDTM.2018.8421507
[4]   Cycle-Accurate Network on Chip Simulation with Noxim [J].
Catania, Vincenzo ;
Mineo, Andrea ;
Monteleone, Salvatore ;
Palesi, Maurizio ;
Patti, Davide .
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 27 (01)
[5]  
Chen KC, 2018, 2018 11TH INTERNATIONAL WORKSHOP ON NETWORK ON CHIP ARCHITECTURES (NOCARC), P27
[6]   A NoC-based simulator for design and evaluation of deep neural networks [J].
Chen, Kun-Chih ;
Ebrahimi, Masoumeh ;
Wang, Ting-Yi ;
Yang, Yuch-Chi ;
Liao, Yuan-Hao .
MICROPROCESSORS AND MICROSYSTEMS, 2020, 77
[7]   Cycle-Accurate NoC-based Convolutional Neural Network Simulator [J].
Chen, Kun-Chih ;
Wang, Ting-Yi ;
Yang, Yueh-Chi .
INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (COINS), 2019, :199-204
[8]   NoC-based DNN Accelerator: A Future Design Paradigm [J].
Chen, Kun-Chih ;
Ebrahimi, Masoumeh ;
Wang, Ting-Yi ;
Yang, Yuch-Chi .
PROCEEDINGS OF THE 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS'19), 2019,
[9]  
Chiang C.M., 1994, INT WORKSH PAR COMP
[10]   Pitstop: Enabling a Virtual Network Free Network-on-Chip [J].
Farrokhbakht, Hossein ;
Kao, Henry ;
Hasan, Kamran ;
Gratz, Paul, V ;
Krishna, Tushar ;
San Miguel, Joshua ;
Jerger, Natalie Enright .
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, :682-695