Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC

被引：14

作者：

Ge, Fen ^{[1
,2
]}

Wu, Ning ^{[1
]}

Xiao, Hao ^{[3
]}

Zhang, Yuanyuan ^{[1
]}

Zhou, Fang ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 211106, Jiangsu, Peoples R China

[2] Sci & Technol Elect Informat Control Lab, Chengdu 610036, Sichuan, Peoples R China

[3] HeFei Univ Technol, Sch Microelect, Hefei 230009, Anhui, Peoples R China

来源：

ELECTRONICS | 2019年 / 8卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural network (CNN); Internet of Things (IoT); endpoint SoC; FPGA; Cortex-M3;

D O I：

10.3390/electronics8050497

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a classical artificial intelligence algorithm, the convolutional neural network (CNN) algorithm plays an important role in image recognition and classification and is gradually being applied in the Internet of Things (IoT) system. A compact CNN accelerator for the IoT endpoint System-on-Chip (SoC) is proposed in this paper to meet the needs of CNN computations. Based on analysis of the CNN structure, basic functional modules of CNN such as convolution circuit and pooling circuit with a low data bandwidth and a smaller area are designed, and an accelerator is constructed in the form of four acceleration chains. After the acceleration unit design is completed, the Cortex-M3 is used to construct a verification SoC and the designed verification platform is implemented on the FPGA to evaluate the resource consumption and performance analysis of the CNN accelerator. The CNN accelerator achieved a throughput of 6.54 GOPS (giga operations per second) by consuming 4901 LUTs without using any hardware multipliers. The comparison shows that the compact accelerator proposed in this paper makes the CNN computational power of the SoC based on the Cortex-M3 kernel two times higher than the quad-core Cortex-A7 SoC and 67% of the computational power of eight-core Cortex-A53 SoC.

引用

页数：15

共 19 条

[1]

[Anonymous], P ISCIT 2018 18 INT

[2]

[Anonymous], ARXIV170905116

[3]

[Anonymous], 2019, ELECTRONICS SWITZ, DOI DOI 10.3390/ELECTRONICS8030281

[4]

ARM, 2015, ARM CORTEX M3 PROCES, V1, P121

[5] Origami: A 803-GOp/s/W Convolutional Network Accelerator [J].

Cavigelli, Lukas ;

Benini, Luca .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (11) :2461-2475

[6] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].

Chen, Yu-Hsin ;

Krishna, Tushar ;

Emer, Joel S. ;

Sze, Vivienne .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138

[7] An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics [J].

Conti, Francesco ;

Schilling, Robert ;

Schiavone, Pasquale Davide ;

Pullini, Antonio ;

Rossi, Davide ;

Gurkaynak, Frank Kagan ;

Muehlberghuber, Michael ;

Gautschi, Michael ;

Loi, Igor ;

Haugou, Germain ;

Mangard, Stefan ;

Benini, Luca .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2017, 64 (09) :2481-2494

[8] A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things [J].

Du, Li ;

Du, Yuan ;

Li, Yilei ;

Su, Junjie ;

Kuan, Yen-Cheng ;

Liu, Chun-Chen ;

Chang, Mau-Chung Frank .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (01) :198-208

[9] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA [J].

Guo, Kaiyuan ;

Sui, Lingzhi ;

Qiu, Jiantao ;

Yu, Jincheng ;

Wang, Junbin ;

Yao, Song ;

Han, Song ;

Wang, Yu ;

Yang, Huazhong .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) :35-47

[10]

Han S., 2015, Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, P1, DOI DOI 10.00149/1510.00149

← 1 2 →