A Data-Driven Asynchronous Neural Network Accelerator

被引：11

作者：

Xiao, Shanlin ^{[1
]}

Liu, Weikun ^{[1
]}

Lin, Junshu ^{[1
]}

Yu, Zhiyi ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2021年 / 40卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Accelerator; asynchronous circuit; data-driven; energy-efficiency; neural network; PROCESSOR;

D O I：

10.1109/TCAD.2020.3025508

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks (DNNs) are revolutionizing machine learning, with unprecedented accuracy on many AI tasks. Energy-efficient neural acceleration is crucial in broadening DNN applications in cloud and mobile end devices. However, power-hungry clock networks limit the energy-efficiency of DNN accelerators. In this work, we propose a novel DNN hardware accelerator, called the asynchronous neural network processor (AsNNP). At the heart of AsNNP is a scalable hierarchy matrix multiply unit, with bit-serial processing elements working in parallel. It replaces the global clock networks with asynchronous handshake protocols to realize the synchronization and communication between each part, minimizing the dynamic power. Meanwhile, a fine-grain asynchronous pipeline based on weak-conditioned half-buffer (WCHB) is introduced to pipe successive computations in a data-driven manner, i.e., once data arrives computation begins, maximizing the throughput. These techniques enable AsNNP to work in a fully data-driven asynchronous communication fashion with optimized energy-efficiency. The proposed accelerator is implemented with quasi-delay-insensitive (QDI) clockless logic family and evaluated in a 65 nm process. Compared with the synchronous baseline, simulation results show that AsNNP offers 2.2 x higher equivalent frequency and 1.59 x lower power. Compared with state-of-the-art DNN accelerators, AsNNP shows 1.17 x -4.97x energy-efficiency improvement.

引用

页码：1874 / 1886

页数：13

共 42 条

[1] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
Albericio, Jorge
Judd, Patrick
Hetherington, Tayler
Aamodt, Tor
Jerger, Natalie Enright
Moshovos, Andreas
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
[2] [Anonymous], 2016, RISTRETTO HARDWARE O
[3] Bhadra D, 2017, DES AUT TEST EUROPE, P794, DOI 10.23919/DATE.2017.7927097
[4] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
Chen, Tianshi
Du, Zidong
Sun, Ninghui
Wang, Jia
Wu, Chengyong
Chen, Yunji
Temam, Olivier
[J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
[5] Chen WJ, 2018, IEEE ASIAN SOLID STA, P51
[6] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Chen, Yu-Hsin
Emer, Joel
Sze, Vivienne
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 367 - 379
[7] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Chi, Ping
Li, Shuangchen
Xu, Cong
Zhang, Tao
Zhao, Jishen
Liu, Yongpan
Wang, Yu
Xie, Yuan
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 27 - 39
[8] Loihi: A Neuromorphic Manycore Processor with On-Chip Learning
Davies, Mike
Srinivasa, Narayan
Lin, Tsung-Han
Chinya, Gautham
Cao, Yongqiang
Choday, Sri Harsha
Dimou, Georgios
Joshi, Prasad
Imam, Nabil
Jain, Shweta
Liao, Yuyun
Lin, Chit-Kwan
Lines, Andrew
Liu, Ruokun
Mathaikutty, Deepak
Mccoy, Steve
Paul, Arnab
Tse, Jonathan
Venkataramanan, Guruguhanathan
Weng, Yi-Hsin
Wild, Andreas
Yang, Yoonseok
Wang, Hong
[J]. IEEE MICRO, 2018, 38 (01) : 82 - 99
[9] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
[10] EIE: Efficient Inference Engine on Compressed Deep Neural Network
Han, Song
Liu, Xingyu
Mao, Huizi
Pu, Jing
Pedram, Ardavan
Horowitz, Mark A.
Dally, William J.
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254

← 1 2 3 4 5 →