Design and Implementation of Deep Learning 2D Convolutions on Modern CPUs

被引：2

作者：

Kelefouras, Vasilios ^{[1
]}

Keramidas, Georgios ^{[2
]}

机构：

[1] Plymouth Univ, Dept Comp, Plymouth PL4 8AA, England

[2] Aristoteleio Panepistemio Thessalonikes, Dept Comp Engn & Informat, Saloniki 54124, Greece

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2023年 / 34卷 / 12期

关键词：

Deep neural networks; convolution; oneDNN; optimization; analytical model; vectorization; register blocking; loop tiling;

D O I：

10.1109/TPDS.2023.3322037

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this article, a new method is provided for accelerating the execution of convolution layers in Deep Neural Networks. This research work provides the theoretical background to efficiently design and implement the convolution layers on x86/x64 CPUs, based on the target layer parameters, quantization level and hardware architecture. The proposed work is general and can be applied to other processor families too, e.g., Arm. The proposed work achieves high speedup values over the state of the art, which is Intel oneDNN library, by applying compiler optimizations, such as vectorization, register blocking and loop tiling, in a more efficient way. This is achieved by developing an analytical modelling approach for finding the optimization parameters. A thorough experimental evaluation has been applied on two Intel CPU platforms, for DenseNet-121, ResNet-50 and SqueezeNet (including 112 different convolution layers), and for both FP32 and int8 input/output tensors (quantization). The experimental results show that the convolution layers of the aforementioned models are executed from x1.1up to x7.2 times faster.

引用

页码：3104 / 3116

页数：13

共 26 条

[1] Budden D, 2017, Arxiv, DOI arXiv:1611.06565
[2] Chaudhary N, 2021, Arxiv, DOI [arXiv:2104.08002, 10.48550/arXiv.2104.08002]
[3] Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
[4] Cheng Y, 2020, Arxiv, DOI [arXiv:1710.09282, 10.48550/arXiv.1710.09282]
[5] Dukhan M, 2019, Arxiv, DOI [arXiv:1907.02129, DOI 10.48550/ARXIV.1907.02129]
[6] Diesel: DSL for Linear Algebra and Neural Net Computations on GPUs
Elango, Venmugil
Rubin, Norm
Ravishankar, Mahesh
Sandanagobalane, Hariharan
Grover, Vinod
[J]. MAPL'18: PROCEEDINGS OF THE 2ND ACM SIGPLAN INTERNATIONAL WORKSHOP ON MACHINE LEARNING AND PROGRAMMING LANGUAGES, 2018, : 42 - 51
[7] DSC: Dense-Sparse Convolution for Vectorized Inference of Convolutional Neural Networks
Frickenstein, Alexander
Vemparala, Manoj Rohit
Unger, Christian
Ayar, Fatih
Stechele, Walter
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1353 - 1360
[8] Georganas E, 2018, INT C HIGH PERFORMAN
[9] SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors
Gong, Zhangxiaowen
Ji, Houxiang
Fletcher, Christopher W.
Hughes, Christopher J.
Torrellas, Josep
[J]. PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 279 - 292
[10] Hao RC, 2022, Arxiv, DOI arXiv:2206.12124

← 1 2 3 →