A Data-Driven Asynchronous Neural Network Accelerator

被引:11
作者
Xiao, Shanlin [1 ]
Liu, Weikun [1 ]
Lin, Junshu [1 ]
Yu, Zhiyi [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Accelerator; asynchronous circuit; data-driven; energy-efficiency; neural network; PROCESSOR;
D O I
10.1109/TCAD.2020.3025508
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNNs) are revolutionizing machine learning, with unprecedented accuracy on many AI tasks. Energy-efficient neural acceleration is crucial in broadening DNN applications in cloud and mobile end devices. However, power-hungry clock networks limit the energy-efficiency of DNN accelerators. In this work, we propose a novel DNN hardware accelerator, called the asynchronous neural network processor (AsNNP). At the heart of AsNNP is a scalable hierarchy matrix multiply unit, with bit-serial processing elements working in parallel. It replaces the global clock networks with asynchronous handshake protocols to realize the synchronization and communication between each part, minimizing the dynamic power. Meanwhile, a fine-grain asynchronous pipeline based on weak-conditioned half-buffer (WCHB) is introduced to pipe successive computations in a data-driven manner, i.e., once data arrives computation begins, maximizing the throughput. These techniques enable AsNNP to work in a fully data-driven asynchronous communication fashion with optimized energy-efficiency. The proposed accelerator is implemented with quasi-delay-insensitive (QDI) clockless logic family and evaluated in a 65 nm process. Compared with the synchronous baseline, simulation results show that AsNNP offers 2.2 x higher equivalent frequency and 1.59 x lower power. Compared with state-of-the-art DNN accelerators, AsNNP shows 1.17 x -4.97x energy-efficiency improvement.
引用
收藏
页码:1874 / 1886
页数:13
相关论文
共 42 条
  • [21] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [22] UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
    Lee, Jinmook
    Kim, Changhyeon
    Kang, Sanghoon
    Shin, Dongjoo
    Kim, Sangyeob
    Yoo, Hoi-Jun
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 173 - 185
  • [23] Li X., 2019, P 56 ANN DES AUT C 2, P1
  • [24] Association of thrombocytopenia with in-hospital outcome in patients with acute ST-segment elevated myocardial infarction
    Liu, Ru
    Liu, Jia
    Yang, Jingang
    Gao, Zhan
    Zhao, Xueyan
    Chen, Jue
    Qiao, Shubin
    Gao, Runlin
    Wang, Qingsheng
    Yang, Hongmei
    Wang, Zhifang
    Su, Shuhong
    Yuan, Jinqing
    Yang, Yuejin
    [J]. PLATELETS, 2019, 30 (07) : 844 - 853
  • [25] Cambricon: An Instruction Set Architecture for Neural Networks
    Liu, Shaoli
    Du, Zidong
    Tao, Jinhua
    Han, Dong
    Luo, Tao
    Xie, Yuan
    Chent, Yunji
    Chent, Tianshi
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 393 - 405
  • [26] FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
    Lu, Wenyan
    Yan, Guihai
    Li, Jiajun
    Gong, Shijun
    Han, Yinhe
    Li, Xiaowei
    [J]. 2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 553 - 564
  • [27] COMPILING COMMUNICATING PROCESSES INTO DELAY-INSENSITIVE VLSI CIRCUITS
    MARTIN, AJ
    [J]. DISTRIBUTED COMPUTING, 1986, 1 (04) : 226 - 234
  • [28] A million spiking-neuron integrated circuit with a scalable communication network and interface
    Merolla, Paul A.
    Arthur, John V.
    Alvarez-Icaza, Rodrigo
    Cassidy, Andrew S.
    Sawada, Jun
    Akopyan, Filipp
    Jackson, Bryan L.
    Imam, Nabil
    Guo, Chen
    Nakamura, Yutaka
    Brezzo, Bernard
    Vo, Ivan
    Esser, Steven K.
    Appuswamy, Rathinakumar
    Taba, Brian
    Amir, Arnon
    Flickner, Myron D.
    Risk, William P.
    Manohar, Rajit
    Modha, Dharmendra S.
    [J]. SCIENCE, 2014, 345 (6197) : 668 - 673
  • [29] NCL Synthesis With Conventional EDA Tools: Technology Mapping and Optimization
    Moreira, Matheus T.
    Beerel, Peter A.
    Sartori, Marcos L. L.
    Calazans, Ney L. V.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (06) : 1981 - 1993
  • [30] Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0
    Muralimanohar, Naveen
    Balsubramonian, Rajeev
    Jouppi, Norm
    [J]. MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 3 - +