ConvAix: An Application-Specific Instruction-Set Processor for the Efficient Acceleration of CNNs

被引:7
作者
Bytyn, Andreas [1 ]
Leupers, Rainer [1 ]
Ascheid, Gerd [1 ]
机构
[1] Rhein Westfal TH Aachen, Inst Commun Technol & Embedded Syst, D-52074 Aachen, Germany
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2021年 / 2卷
关键词
Application-specific instruction-set processor (ASIP); convolutional neural network (CNN); very large instruction word (VLIW); quantization; low-precision computing; instruction-set architecture (ISA); deep learning; machine learning; processor architecture; subword parallel;
D O I
10.1109/OJCAS.2020.3037758
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
ConvAix is an application-specific instruction-set processor (ASIP) that enables the energy-efficient processing of convolutional neural networks (CNNs) while retaining substantial flexibility through its instruction-set architecture (ISA) based design. By utilizing a combination of data-level parallelism (DLP), instruction-level parallelism (ILP), and subword parallelism, the proposed design offers sufficient processing power for the execution of state-of-the-art CNNs in real-time. ConvAix's arithmetic logic units (ALUs) are C-programmable, thereby offering the degree of flexibility required to implement many different convolution layer types, e.g., depthwise-separable convolutions and residual blocks, as well as fully-connected and pooling layers. It comprises a total of 256 ALUs and leverages low-precision computations down to 4 bits. Furthermore, it exploits sparsity in feature maps and weights via zero-guarding of redundant computations to maximize its energy efficiency. The processor was implemented in a modern 28 nm CMOS technology operating at 1 V supply voltage with a resulting clock frequency of 513 MHz. The final design offers a precision-dependent peak throughput between 263 GOP/s (int16) and 1.1 TOP/s (int4), while consuming between 972 mW and 340 mW of power, resulting in effective energy-efficiencies ranging from 176 GOP/s/W to 2 TOP/s/W. Well-known CNNs, such as AlexNet, MobileNet, and ResNet-18, are simulated based on the placed and routed netlist, achieving between 233 (AlexNet) and 69 (ResNet-18) frames-per-second for a batch-size of 1, including times for off-chip transfers.
引用
收藏
页码:3 / 15
页数:13
相关论文
共 43 条
  • [1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
    Aimar, Alessandro
    Mostafa, Hesham
    Calabrese, Enrico
    Rios-Navarro, Antonio
    Tapiador-Morales, Ricardo
    Lungu, Iulia-Alexandra
    Milde, Moritz B.
    Corradi, Federico
    Linares-Barranco, Alejandro
    Liu, Shih-Chii
    Delbruck, Tobi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 644 - 656
  • [2] ARM, 2019, AMBA AXI and ACE Protocol Specification
  • [3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [4] Bytyn A., 2019, PROC IEEE INT S CIRC, P1
  • [5] Origami: A 803-GOp/s/W Convolutional Network Accelerator
    Cavigelli, Lukas
    Benini, Luca
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (11) : 2461 - 2475
  • [6] USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS
    Chen, Yu-Hsin
    Emer, Joel
    Sze, Vivienne
    [J]. IEEE MICRO, 2017, 37 (03) : 12 - 21
  • [7] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [8] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [9] Courbariaux M, 2015, ADV NEUR IN, V28
  • [10] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893