High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution

被引:23
作者
Anderson, Andrew [1 ]
Vasudevan, Aravind [1 ]
Keane, Cormac [1 ]
Gregg, David [1 ]
机构
[1] Trinity Coll Dublin, Sch Comp Sci & Stat, Dublin, Ireland
来源
2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020) | 2020年
基金
爱尔兰科学基金会; 欧盟地平线“2020”;
关键词
neural networks; embedded software; performance;
D O I
10.1109/SBAC-PAD49847.2020.00024
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Network Convolution is often implemented with general matrix multiplication ( GEMM) using the well-known im2col algorithm. This algorithm constructs a Toeplitz matrix from the input feature maps, and multiplies them by the convolutional kernel. With input feature map dimensions C x H x W and kernel dimensions M x C x K-2, im2col requires O(K-2 CHW) additional space. Although this approach is very popular, there has been little study of the associated design space. We show that the im2col algorithm is just one point in a regular design space of algorithms which translate convolution to GEMM. We enumerate this design space, and experimentally evaluate each algorithmic variant. Our evaluation yields several novel low-memory algorithms which match the performance of the best known approaches despite requiring only a small fraction of the additional memory.
引用
收藏
页码:99 / 106
页数:8
相关论文
共 14 条
[1]   Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming [J].
Anderson, Andrew ;
Gregg, David .
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, :340-351
[2]  
[Anonymous], 2014, Comput. Sci.
[3]  
[Anonymous], 2006, 10 INT WORKSHOP FRON
[4]  
Chetlur S., 2014, CUDNN EFFICIENT PRIM
[5]  
Cho M, 2017, PR MACH LEARN RES, V70
[6]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[7]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[8]   Boda: A Holistic Approach for Implementing Neural Network Computations [J].
Moskewicz, Matthew W. ;
Jannesari, Ali ;
Keutzer, Kurt .
ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, :53-62
[9]  
Szegedy C., 2015, P IEEE C COMP VIS PA, P1
[10]  
Truong L, 2016, ACM SIGPLAN NOTICES, V51, P209, DOI [10.1145/2908080.2908105, 10.1145/2980983.2908105]