Optimizing N-Dimensional, Winograd-Based Convolution for Manycore CPUs

被引:33
作者
Jia, Zhen [1 ]
Zlateski, Aleksandar [2 ]
Durand, Fredo [2 ]
Li, Kai [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] MIT, Cambridge, MA 02139 USA
关键词
convolution; winograd; vectorization; parallelization;
D O I
10.1145/3200691.3178496
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recent work on Winograd-based convolution allows for a great reduction of computational complexity, but existing implementations are limited to 2D data and a single kernel size of 3 by 3. They can achieve only slightly better, and often worse performance than better optimized, direct convolution implementations. We propose and implement an algorithm for N-dimensional Winograd-based convolution that allows arbitrary kernel sizes and is optimized for manycore CPUs. Our algorithm achieves high hardware utilization through a series of optimizations. Our experiments show that on modern ConvNets, our optimized implementation, is on average more than 3x, and sometimes 8x faster than other state-of-the-art CPU implementations on an Intel Xeon Phi manycore processors. Moreover, our implementation on the Xeon Phi achieves competitive performance for 2D ConvNets and superior performance for 3D ConvNets, compared with the best GPU implementations.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 48 条
[1]  
[Anonymous], 2016, ARXIV161205360
[2]  
[Anonymous], 2016, INTEL R MATH KERNEL
[3]  
[Anonymous], SC
[4]  
[Anonymous], P IEEE C COMP VIS PA
[5]  
[Anonymous], 2016, FALCON LIB FAST IMAG
[6]  
[Anonymous], 2011, PROC DEEP LEARN UNS
[7]  
[Anonymous], PAR DISTR PROC S 201
[8]  
[Anonymous], 2014, ARXIV NEURAL EVOLUTI
[9]  
[Anonymous], 2014, ABS14127024 CORR
[10]  
[Anonymous], 2017, SEGNET DEEP CONVOLUT