Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引:4
作者
Oh, Chanyoung [1 ,2 ]
So, Junhyuk [2 ]
Kim, Sumin [2 ]
Yi, Youngmin [2 ]
机构
[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea
[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea
关键词
On-device deep learning; convolutional neural network; sparsity;
D O I
10.1145/3477008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.
引用
收藏
页数:25
相关论文
共 29 条
  • [1] Optuna: A Next-generation Hyperparameter Optimization Framework
    Akiba, Takuya
    Sano, Shotaro
    Yanase, Toshihiko
    Ohta, Takeru
    Koyama, Masanori
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2623 - 2631
  • [2] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
    Albericio, Jorge
    Judd, Patrick
    Hetherington, Tayler
    Aamodt, Tor
    Jerger, Natalie Enright
    Moshovos, Andreas
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 1 - 13
  • [3] [Anonymous], 2017, INT C LEARN REPR
  • [4] Cai H., 2019, ARXIV PREPRINT ARXIV, P1
  • [5] SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization
    Cao, Shijie
    Ma, Lingxiao
    Xiao, Wencong
    Zhang, Chen
    Liu, Yunxin
    Zhang, Lintao
    Nie, Lanshun
    Yang, Zhi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11208 - 11217
  • [6] A fast and elitist multiobjective genetic algorithm: NSGA-II
    Deb, K
    Pratap, A
    Agarwal, S
    Meyarivan, T
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) : 182 - 197
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [8] Elsen Erich, 2020, P IEEE CVF C COMP VI
  • [9] Spatially Adaptive Computation Time for Residual Networks
    Figurnov, Michael
    Collins, Maxwell D.
    Zhu, Yukun
    Zhang, Li
    Huang, Jonathan
    Vetrov, Dmitry
    Salakhutdinov, Ruslan
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1790 - 1799
  • [10] Accelerating Convolutional Neural Networks via Activation Map Compression
    Georgiadis, Georgios
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7078 - 7088