MLCNN: Cross-Layer Cooperative Optimization and Accelerator Architecture for Speeding Up Deep Learning Applications

被引:1
作者
Jiang, Beilei [1 ]
Cheng, Xianwei [1 ]
Tang, Sihai [1 ]
Ma, Xu [1 ]
Gu, Zhaochen [1 ]
Fu, Song [1 ]
Yang, Qing [1 ]
Liu, Mingxiong [2 ]
机构
[1] Univ North Texas, Denton, TX 76203 USA
[2] Los Alamos Natl Lab, Los Alamos, NM USA
来源
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022) | 2022年
基金
美国国家科学基金会;
关键词
Deep learning; Cross-layer optimization; Accelerators; Performance evaluation;
D O I
10.1109/IPDPS53621.2022.00118
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The ever-increasing number of layers, millions of parameters, and large data volume make deep learning workloads resource-intensive and power-hungry. In this paper, we develop a convolutional neural network (CNN) acceleration framework, named MLCNN, which explores algorithm-hardware co-design to achieve cross-layer cooperative optimization and acceleration. MLCNN dramatically reduces computation and on-off chip communication, improving CNN's performance. To achieve this, MLCNN reorders the position of nonlinear activation layers and pooling layers, which we prove results in a negligible accuracy loss; then the convolutional layer and pooling layer are co-optimized by means of redundant multiplication elimination, local addition reuse, and global addition reuse. To the best of our knowledge, MLCNN is the first of its kind that incorporates cooperative optimization across convolutional, activation, and pooling layers. We further customize the MLCNN accelerator to take full advantage of cross-layer CNN optimization to reduce both computation and on-off chip communication. Our analysis shows that MLCNN can significantly reduce (up to 98%) multiplications and additions. We have implemented a prototype of MLCNN and evaluated its performance on several widely used CNN models using both an accelerator-level cycle and energy model and RTL implementation. Experimental results show that MLCNN achieves 3.2x speedup and 2.9x energy efficiency compared with dense CNNs. MLCNN's optimization methods are orthogonal to other CNN acceleration techniques, such as quantization and pruning. Combined with quantization, our quantized MLCNN gains a 12.8x speedup and 11.3x energy efficiency compared with DCNN.
引用
收藏
页码:1184 / 1194
页数:11
相关论文
共 35 条
[1]  
Alex K, 2010, CIFAR 10 CIFAR 100 D
[2]  
[Anonymous], 2007, Advances in Neural Information Processing Systems
[3]  
[Anonymous], 2015, PROC CVPR IEEE
[4]  
[Anonymous], 2014, STRIVING SIMPLICITY
[5]  
[Anonymous], 2018, Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration
[6]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[7]  
Chang SE, 2021, INT S HIGH PERF COMP, P208, DOI [10.1109/HPCA51647.2021.00027, 10.1109/WRCSARA53879.2021.9612678]
[8]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[9]  
Daultani V., 2016, SASIMI
[10]   SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks [J].
Gondimalla, Ashish ;
Chesnut, Noah ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, :151-165