An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

被引：8

作者：

Zhao, Yunping ^{[1
]}

Lu, Jianzhuang ^{[1
]}

Chen, Xiaowen ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

来源：

SENSORS | 2020年 / 20卷 / 19期

关键词：

CNNs accelerator; parallel computing algorithm; hardware architecture; CONVOLUTION;

D O I：

10.3390/s20195558

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2x-4.0x faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.

引用

页码：1 / 15

页数：15

共 50 条

[1] A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
Zhao, Yunping
Lu, Jianzhuang
Chen, Xiaowen
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 517 - 535
[2] An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL
Vemparala, Manoj Rohit
Frickenstein, Alexander
Stechele, Walter
ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2019, 2019, 11479 : 236 - 249
[3] OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm
Lou, Wenqi
Lei Gong
Chao Wang
Du, Zidong
Zhou Xuehai
IEEE TRANSACTIONS ON COMPUTERS, 2021, 71 (08) : 1847 - 1859
[4] OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm
Lou, Wenqi
Wang, Chao
Gong, Lei
Zhou, Xuehai
2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 410 - 411
[5] Implement Tracking Algorithm Using CNNs
Li, Chaoran
Xi, Yuling
Ding, Songtao
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7137 - 7141
[6] An Efficient Sparse CNNs Accelerator on FPGA
Zhang, Yonghua
Jiang, Hongxu
Li, Xiaobin
Wang, Haojie
Dong, Dong
Cao, Yongxiang
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 504 - 505
[7] Hardware-Efficient Template-Based Deep CNNs Accelerator Design
Alhussain, Azzam
Lin, Mingjie
2022 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2022, : 9 - 12
[8] XNORCONV: CNNs accelerator implemented on FPGA using a hybrid CNNs structure and an inter-layer pipeline method
Zhang, Lin
Bu, Xiaokang
Li, Bing
IET IMAGE PROCESSING, 2020, 14 (01) : 105 - 113
[9] A Systolic Dataflow Based Accelerator for CNNs
Das, Saptarsi
Roy, Arnab
Chandrasekharan, Kiran Kolar
Deshwal, Ankur
Lee, Sehwan
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[10] PACA: A Pattern Pruning Algorithm and Channel-Fused High PE Utilization Accelerator for CNNs
Wang, Jingyu
Yu, Songming
Yuan, Zhuqing
Yue, Jinshan
Yuan, Zhe
Liu, Ruoyang
Wang, Yanzhi
Yang, Huazhong
Li, Xueqing
Liu, Yongpan
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 5043 - 5056

← 1 2 3 4 5 →