OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

被引:8
作者
Castro, Roberto L. [1 ]
Andrade, Diego [1 ]
Fraguela, Basilio B. [1 ]
机构
[1] Univ A Coruna, Ctr Invest CITIC, Campus Elvina, La Coruna 15071, Spain
关键词
deep learning; convolution; Winograd; CUDA; CONVOLUTION;
D O I
10.3390/math9172033
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution can be solved as it is mathematically enunciated, but other methods allow to transform it into a Fast Fourier Transform (FFT) or a GEneral Matrix Multiplication (GEMM). In this latter group, the Winograd algorithm is a state-of-the-art variant that is specially suitable for smaller convolutions. In this paper, we present openCNN, an optimized CUDA C++ implementation of the Winograd convolution algorithm. Our approach achieves speedups of up to 1.76x on Turing RTX 2080Ti and up to 1.85x on Ampere RTX 3090 with respect to Winograd convolution in cuDNN 8.2.0. OpenCNN is released as open-source software.
引用
收藏
页数:19
相关论文
共 27 条
[1]  
[Anonymous], 1980, ARITHMETIC COMPLEXIT
[2]  
[Anonymous], 2014, ARXIV14100759
[3]   Winograd Convolution for DNNs: Beyond Linear Polynomials [J].
Barabasz, Barbara ;
Gregg, David .
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI*IA 2019, 2019, 11946 :307-320
[4]  
Georganas Evangelos, 2018, P INT C HIGH PERF CO
[5]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[6]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]  
Hestness J., 2017, DEEP LEARNING SCALIN
[8]  
Horn RA, 1990, P S APPL MATH, V40, P87, DOI DOI 10.1090/PSAPM/040
[9]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[10]   A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm [J].
Huang, Y. ;
Shen, J. ;
Wang, Z. ;
Wen, M. ;
Zhang, C. .
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING (ICECC), 2018, 1026