Accelerating Convolutional Neural Network with FFT on Tiny Cores

被引:0
|
作者
Abtahi, Tahmid [1 ]
Kulkarni, Amey [1 ]
Mohsenin, Tinoosh [1 ]
机构
[1] Univ Maryland, Dept Comp Sci & Elect Engn, Baltimore, MD 21201 USA
来源
2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) | 2017年
关键词
Convolutional Neural Network; Domain-Specific Many-Core Accelerator; FFT Overlap and Add; Energy Efficient;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fueled by ILSVRC and COCO competitions, Convolutional Neural Network (CNN) has become important in computer vision, and natural language processing. However state-of-the-art CNNs are computationally and memory intensive, thus energy efficient implementation on embedded platform is challenging. Recently VGGNet and ResNet showed that deep neural networks with more convolution layers (CV) and few fully connected layer (FC) can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. To reduce computations and shared memory usage in convolution layers, in this paper we evaluate the performance of direct convolution (Direct-Conv), Fast Fourier Transform (FFT) based convolution (FFT-Conv), and Overlap and Add FFT convolution (FFT-OVA-Conv) in embedded architecture including a low power domain specific many-core architecture called Power Efficient Nano Clusters (PENC) and ARM Cortex A53 CPU. To demonstrate the efficiency of FFT-Conv and FFT-OVA-Conv, we map ResNet-20 for the CIFAR-10 dataset on PENC as well as in ARM Cortex A53 CPU. Results are evaluated and compared with respect to throughput per watt, energy delay product, and execution time for three methods. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs 2.9x and 1.65x faster and achieves 6.7x and 2.3x better throughput per watt than Direct-Conv and FFT-Conv respectively. In ARM A53 CPU, the FFT-OVA-Conv achieves 3.36x and 1.38x improvement in execution time and 2.72x and 1.32x better throughput than Direct-Conv and FFT-Conv.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Accelerating Convolutional Neural Network With FFT on Embedded Hardware
    Abtahi, Tahmid
    Shea, Colin
    Kulkarni, Amey
    Mohsenin, Tinoosh
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) : 1737 - 1749
  • [2] Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
    Xu, Weizhi
    Sun, Yintai
    Fan, Shengyu
    Yu, Hui
    Fu, Xin
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (03)
  • [3] An Efficient Convolutional Neural Network With Attached Accelerating Strategy
    Gao, Kangyu
    Zhang, Qingyong
    Yu, Luyang
    2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 361 - 364
  • [4] Face Swapping Using Convolutional Neural Network and Tiny Facet Primitive
    Huang R.
    Jia Y.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2021, 46 (03): : 335 - 340
  • [5] Accelerating Deep Convolutional Neural Network base on stochastic computing
    Sadi, Mohamad Hasani
    Mahani, Ali
    INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 113 - 121
  • [6] Image Super-resolution Based on Tiny Recurrent Convolutional Neural Network
    Ma Hao-yu
    Xu Zhi-hai
    Feng Hua-jun
    Li Qi
    Chen Yue-ting
    ACTA PHOTONICA SINICA, 2018, 47 (04)
  • [7] Fully Binarized Convolutional Neural Network for Accelerating Edge Vision Computing
    Jiang, Peiqing
    Wu, Lijun
    Chen, Zhicong
    Lai, Yunfeng
    Cheng, Shuying
    Lin, Peijie
    2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 164 - 169
  • [8] An epilepsy classification based on FFT and fully convolutional neural network nested LSTM
    Nie, Jianhao
    Shu, Huazhong
    Wu, Fuzhi
    FRONTIERS IN NEUROSCIENCE, 2024, 18
  • [9] Tiny Image Classification using Four-Block Convolutional Neural Network
    Sharif, Mohsin
    Kausar, Asia
    Park, JinHyuck
    Shin, Dong Ryeol
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1 - 6
  • [10] Accelerating Convolutional Neural Network Inference Based on a Reconfigurable Sliced Systolic Array
    Zeng, Yixuan
    Sun, Heming
    Katto, Jiro
    Fan, Yibo
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,