Accelerating Convolutional Neural Network with FFT on Tiny Cores

被引：0

作者：

Abtahi, Tahmid ^{[1
]}

Kulkarni, Amey ^{[1
]}

Mohsenin, Tinoosh ^{[1
]}

机构：

[1] Univ Maryland, Dept Comp Sci & Elect Engn, Baltimore, MD 21201 USA

来源：

2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) | 2017年

关键词：

Convolutional Neural Network; Domain-Specific Many-Core Accelerator; FFT Overlap and Add; Energy Efficient;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fueled by ILSVRC and COCO competitions, Convolutional Neural Network (CNN) has become important in computer vision, and natural language processing. However state-of-the-art CNNs are computationally and memory intensive, thus energy efficient implementation on embedded platform is challenging. Recently VGGNet and ResNet showed that deep neural networks with more convolution layers (CV) and few fully connected layer (FC) can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. To reduce computations and shared memory usage in convolution layers, in this paper we evaluate the performance of direct convolution (Direct-Conv), Fast Fourier Transform (FFT) based convolution (FFT-Conv), and Overlap and Add FFT convolution (FFT-OVA-Conv) in embedded architecture including a low power domain specific many-core architecture called Power Efficient Nano Clusters (PENC) and ARM Cortex A53 CPU. To demonstrate the efficiency of FFT-Conv and FFT-OVA-Conv, we map ResNet-20 for the CIFAR-10 dataset on PENC as well as in ARM Cortex A53 CPU. Results are evaluated and compared with respect to throughput per watt, energy delay product, and execution time for three methods. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs 2.9x and 1.65x faster and achieves 6.7x and 2.3x better throughput per watt than Direct-Conv and FFT-Conv respectively. In ARM A53 CPU, the FFT-OVA-Conv achieves 3.36x and 1.38x improvement in execution time and 2.72x and 1.32x better throughput than Direct-Conv and FFT-Conv.

引用

页数：4

共 50 条

[1] Accelerating Convolutional Neural Network With FFT on Embedded Hardware
Abtahi, Tahmid
Shea, Colin
Kulkarni, Amey
Mohsenin, Tinoosh
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) : 1737 - 1749
[2] Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
Xu, Weizhi
Sun, Yintai
Fan, Shengyu
Yu, Hui
Fu, Xin
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (03)
[3] An Efficient Convolutional Neural Network With Attached Accelerating Strategy
Gao, Kangyu
Zhang, Qingyong
Yu, Luyang
2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 361 - 364
[4] Face Swapping Using Convolutional Neural Network and Tiny Facet Primitive
Huang R.
Jia Y.
Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2021, 46 (03): : 335 - 340
[5] Accelerating Deep Convolutional Neural Network base on stochastic computing
Sadi, Mohamad Hasani
Mahani, Ali
INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 113 - 121
[6] Image Super-resolution Based on Tiny Recurrent Convolutional Neural Network
Ma Hao-yu
Xu Zhi-hai
Feng Hua-jun
Li Qi
Chen Yue-ting
ACTA PHOTONICA SINICA, 2018, 47 (04)
[7] Fully Binarized Convolutional Neural Network for Accelerating Edge Vision Computing
Jiang, Peiqing
Wu, Lijun
Chen, Zhicong
Lai, Yunfeng
Cheng, Shuying
Lin, Peijie
2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 164 - 169
[8] An epilepsy classification based on FFT and fully convolutional neural network nested LSTM
Nie, Jianhao
Shu, Huazhong
Wu, Fuzhi
FRONTIERS IN NEUROSCIENCE, 2024, 18
[9] Tiny Image Classification using Four-Block Convolutional Neural Network
Sharif, Mohsin
Kausar, Asia
Park, JinHyuck
Shin, Dong Ryeol
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1 - 6
[10] Accelerating Convolutional Neural Network Inference Based on a Reconfigurable Sliced Systolic Array
Zeng, Yixuan
Sun, Heming
Katto, Jiro
Fan, Yibo
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,

← 1 2 3 4 5 →