An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

被引:8
|
作者
Zhao, Yunping [1 ]
Lu, Jianzhuang [1 ]
Chen, Xiaowen [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
关键词
CNNs accelerator; parallel computing algorithm; hardware architecture; CONVOLUTION;
D O I
10.3390/s20195558
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2x-4.0x faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [1] A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
    Zhao, Yunping
    Lu, Jianzhuang
    Chen, Xiaowen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 517 - 535
  • [2] An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL
    Vemparala, Manoj Rohit
    Frickenstein, Alexander
    Stechele, Walter
    ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2019, 2019, 11479 : 236 - 249
  • [3] OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm
    Lou, Wenqi
    Lei Gong
    Chao Wang
    Du, Zidong
    Zhou Xuehai
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 71 (08) : 1847 - 1859
  • [4] OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm
    Lou, Wenqi
    Wang, Chao
    Gong, Lei
    Zhou, Xuehai
    2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 410 - 411
  • [5] Implement Tracking Algorithm Using CNNs
    Li, Chaoran
    Xi, Yuling
    Ding, Songtao
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7137 - 7141
  • [6] An Efficient Sparse CNNs Accelerator on FPGA
    Zhang, Yonghua
    Jiang, Hongxu
    Li, Xiaobin
    Wang, Haojie
    Dong, Dong
    Cao, Yongxiang
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 504 - 505
  • [7] Hardware-Efficient Template-Based Deep CNNs Accelerator Design
    Alhussain, Azzam
    Lin, Mingjie
    2022 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2022, : 9 - 12
  • [8] XNORCONV: CNNs accelerator implemented on FPGA using a hybrid CNNs structure and an inter-layer pipeline method
    Zhang, Lin
    Bu, Xiaokang
    Li, Bing
    IET IMAGE PROCESSING, 2020, 14 (01) : 105 - 113
  • [9] A Systolic Dataflow Based Accelerator for CNNs
    Das, Saptarsi
    Roy, Arnab
    Chandrasekharan, Kiran Kolar
    Deshwal, Ankur
    Lee, Sehwan
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [10] PACA: A Pattern Pruning Algorithm and Channel-Fused High PE Utilization Accelerator for CNNs
    Wang, Jingyu
    Yu, Songming
    Yuan, Zhuqing
    Yue, Jinshan
    Yuan, Zhe
    Liu, Ruoyang
    Wang, Yanzhi
    Yang, Huazhong
    Li, Xueqing
    Liu, Yongpan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 5043 - 5056