A Data-Driven Asynchronous Neural Network Accelerator

被引：11

作者：

Xiao, Shanlin ^{[1
]}

Liu, Weikun ^{[1
]}

Lin, Junshu ^{[1
]}

Yu, Zhiyi ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2021年 / 40卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Accelerator; asynchronous circuit; data-driven; energy-efficiency; neural network; PROCESSOR;

D O I：

10.1109/TCAD.2020.3025508

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks (DNNs) are revolutionizing machine learning, with unprecedented accuracy on many AI tasks. Energy-efficient neural acceleration is crucial in broadening DNN applications in cloud and mobile end devices. However, power-hungry clock networks limit the energy-efficiency of DNN accelerators. In this work, we propose a novel DNN hardware accelerator, called the asynchronous neural network processor (AsNNP). At the heart of AsNNP is a scalable hierarchy matrix multiply unit, with bit-serial processing elements working in parallel. It replaces the global clock networks with asynchronous handshake protocols to realize the synchronization and communication between each part, minimizing the dynamic power. Meanwhile, a fine-grain asynchronous pipeline based on weak-conditioned half-buffer (WCHB) is introduced to pipe successive computations in a data-driven manner, i.e., once data arrives computation begins, maximizing the throughput. These techniques enable AsNNP to work in a fully data-driven asynchronous communication fashion with optimized energy-efficiency. The proposed accelerator is implemented with quasi-delay-insensitive (QDI) clockless logic family and evaluated in a 65 nm process. Compared with the synchronous baseline, simulation results show that AsNNP offers 2.2 x higher equivalent frequency and 1.59 x lower power. Compared with state-of-the-art DNN accelerators, AsNNP shows 1.17 x -4.97x energy-efficiency improvement.

引用

页码：1874 / 1886

页数：13

共 42 条

[21] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[22] UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
Lee, Jinmook
Kim, Changhyeon
Kang, Sanghoon
Shin, Dongjoo
Kim, Sangyeob
Yoo, Hoi-Jun
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 173 - 185
[23] Li X., 2019, P 56 ANN DES AUT C 2, P1
[24] Association of thrombocytopenia with in-hospital outcome in patients with acute ST-segment elevated myocardial infarction
Liu, Ru
Liu, Jia
Yang, Jingang
Gao, Zhan
Zhao, Xueyan
Chen, Jue
Qiao, Shubin
Gao, Runlin
Wang, Qingsheng
Yang, Hongmei
Wang, Zhifang
Su, Shuhong
Yuan, Jinqing
Yang, Yuejin
[J]. PLATELETS, 2019, 30 (07) : 844 - 853
[25] Cambricon: An Instruction Set Architecture for Neural Networks
Liu, Shaoli
Du, Zidong
Tao, Jinhua
Han, Dong
Luo, Tao
Xie, Yuan
Chent, Yunji
Chent, Tianshi
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 393 - 405
[26] FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
Lu, Wenyan
Yan, Guihai
Li, Jiajun
Gong, Shijun
Han, Yinhe
Li, Xiaowei
[J]. 2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 553 - 564
[27] COMPILING COMMUNICATING PROCESSES INTO DELAY-INSENSITIVE VLSI CIRCUITS
MARTIN, AJ
[J]. DISTRIBUTED COMPUTING, 1986, 1 (04) : 226 - 234
[28] A million spiking-neuron integrated circuit with a scalable communication network and interface
Merolla, Paul A.
Arthur, John V.
Alvarez-Icaza, Rodrigo
Cassidy, Andrew S.
Sawada, Jun
Akopyan, Filipp
Jackson, Bryan L.
Imam, Nabil
Guo, Chen
Nakamura, Yutaka
Brezzo, Bernard
Vo, Ivan
Esser, Steven K.
Appuswamy, Rathinakumar
Taba, Brian
Amir, Arnon
Flickner, Myron D.
Risk, William P.
Manohar, Rajit
Modha, Dharmendra S.
[J]. SCIENCE, 2014, 345 (6197) : 668 - 673
[29] NCL Synthesis With Conventional EDA Tools: Technology Mapping and Optimization
Moreira, Matheus T.
Beerel, Peter A.
Sartori, Marcos L. L.
Calazans, Ney L. V.
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (06) : 1981 - 1993
[30] Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0
Muralimanohar, Naveen
Balsubramonian, Rajeev
Jouppi, Norm
[J]. MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 3 - +

← 1 2 3 4 5 →