ConvAix: An Application-Specific Instruction-Set Processor for the Efficient Acceleration of CNNs

被引：7

作者：

Bytyn, Andreas ^{[1
]}

Leupers, Rainer ^{[1
]}

Ascheid, Gerd ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Inst Commun Technol & Embedded Syst, D-52074 Aachen, Germany

来源：

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2021年 / 2卷

关键词：

Application-specific instruction-set processor (ASIP); convolutional neural network (CNN); very large instruction word (VLIW); quantization; low-precision computing; instruction-set architecture (ISA); deep learning; machine learning; processor architecture; subword parallel;

D O I：

10.1109/OJCAS.2020.3037758

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

ConvAix is an application-specific instruction-set processor (ASIP) that enables the energy-efficient processing of convolutional neural networks (CNNs) while retaining substantial flexibility through its instruction-set architecture (ISA) based design. By utilizing a combination of data-level parallelism (DLP), instruction-level parallelism (ILP), and subword parallelism, the proposed design offers sufficient processing power for the execution of state-of-the-art CNNs in real-time. ConvAix's arithmetic logic units (ALUs) are C-programmable, thereby offering the degree of flexibility required to implement many different convolution layer types, e.g., depthwise-separable convolutions and residual blocks, as well as fully-connected and pooling layers. It comprises a total of 256 ALUs and leverages low-precision computations down to 4 bits. Furthermore, it exploits sparsity in feature maps and weights via zero-guarding of redundant computations to maximize its energy efficiency. The processor was implemented in a modern 28 nm CMOS technology operating at 1 V supply voltage with a resulting clock frequency of 513 MHz. The final design offers a precision-dependent peak throughput between 263 GOP/s (int16) and 1.1 TOP/s (int4), while consuming between 972 mW and 340 mW of power, resulting in effective energy-efficiencies ranging from 176 GOP/s/W to 2 TOP/s/W. Well-known CNNs, such as AlexNet, MobileNet, and ResNet-18, are simulated based on the placed and routed netlist, achieving between 233 (AlexNet) and 69 (ResNet-18) frames-per-second for a batch-size of 1, including times for off-chip transfers.

引用

页码：3 / 15

页数：13

共 43 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Aimar, Alessandro
Mostafa, Hesham
Calabrese, Enrico
Rios-Navarro, Antonio
Tapiador-Morales, Ricardo
Lungu, Iulia-Alexandra
Milde, Moritz B.
Corradi, Federico
Linares-Barranco, Alejandro
Liu, Shih-Chii
Delbruck, Tobi
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) : 644 - 656
[2] ARM, 2019, AMBA AXI and ACE Protocol Specification
[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[4] Bytyn A., 2019, PROC IEEE INT S CIRC, P1
[5] Origami: A 803-GOp/s/W Convolutional Network Accelerator
Cavigelli, Lukas
Benini, Luca
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (11) : 2461 - 2475
[6] USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS
Chen, Yu-Hsin
Emer, Joel
Sze, Vivienne
[J]. IEEE MICRO, 2017, 37 (03) : 12 - 21
[7] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Chen, Yu-Hsin
Krishna, Tushar
Emer, Joel S.
Sze, Vivienne
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
[8] DaDianNao: A Machine-Learning Supercomputer
Chen, Yunji
Luo, Tao
Liu, Shaoli
Zhang, Shijin
He, Liqiang
Wang, Jia
Li, Ling
Chen, Tianshi
Xu, Zhiwei
Sun, Ninghui
Temam, Olivier
[J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
[9] Courbariaux M, 2015, ADV NEUR IN, V28
[10] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893

← 1 2 3 4 5 →