Enhancing the Utilization of Processing Elements in Spatial Deep Neural Network Accelerators

被引:4
作者
Asadikouhanjani, Mohammadreza [1 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dataflow; deep neural network (DNN); negative output feature; processing element (PE); slack time; zero skipping;
D O I
10.1109/TCAD.2020.3031240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equipping mobile platforms with deep learning applications is very valuable. Providing healthcare services in remote areas, improving privacy, and lowering needed communication bandwidth are the advantages of such platforms. Designing an efficient computation engine enhances the performance of these platforms while running deep neural networks (DNNs). Energy-efficient DNN accelerators use skipping sparsity and early negative output feature detection to prune the computations. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures, such as systolic arrays. These accelerators need a separate data distribution fabric like buses or trees with support for high bandwidth to run the mentioned techniques efficiently and avoid network on chip (NoC)-based stalls. Spatial designs suffer from divergence and unequal work distribution. Therefore, applying computation-pruning techniques into a spatial design, which is even equipped with an NoC that supports high bandwidth for the processing elements (PEs), still causes stalls inside the computation engine. In a spatial architecture, the PEs that perform their tasks earlier have a slack time compared to others. In this article, we propose an architecture with a negligible area overhead based on sharing the scratchpads in a novel way between the PEs to use the available slack time caused by applying computation-pruning techniques or the used NoC format. With the use of our dataflow, a spatial engine can benefit from computation-pruning and data reuse techniques more efficiently. When compared to the reference design, our proposed method achieves a speedup of x1.24 and an energy efficiency of x1.18 per inference.
引用
收藏
页码:1947 / 1951
页数:5
相关论文
共 13 条
  • [1] A Novel Architecture for Early Detection of Negative Output Features in Deep Neural Network Accelerators
    Asadikouhanjani, Mohammadreza
    Ko, Seok-Bum
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (12) : 3332 - 3336
  • [2] NoC-based DNN Accelerator: A Future Design Paradigm
    Chen, Kun-Chih
    Ebrahimi, Masoumeh
    Wang, Ting-Yi
    Yang, Yuch-Chi
    [J]. PROCEEDINGS OF THE 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS'19), 2019,
  • [3] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
  • [4] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Chen, Yu-Hsin
    Yange, Tien-Ju
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 292 - 308
  • [5] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [6] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [7] Energy-Efficient Design of Processing Element for Convolutional Neural Network
    Choi, Yeongjae
    Bae, Dongmyung
    Sim, Jaehyeong
    Choi, Seungkyu
    Kim, Minhye
    Kim, Lee-Sup
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2017, 64 (11) : 1332 - 1336
  • [8] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
    Guo, Kaiyuan
    Sui, Lingzhi
    Qiu, Jiantao
    Yu, Jincheng
    Wang, Junbin
    Yao, Song
    Han, Song
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
  • [9] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [10] MobileNetV2: Inverted Residuals and Linear Bottlenecks
    Sandler, Mark
    Howard, Andrew
    Zhu, Menglong
    Zhmoginov, Andrey
    Chen, Liang-Chieh
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4510 - 4520