Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

被引:1
作者
Grimaldi, Matteo [1 ]
Ganji, Darshan C. [1 ]
Lazarevich, Ivan [1 ]
Sah, Sudhakar [1 ]
机构
[1] Deeplite, Toronto, ON, Canada
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年
关键词
D O I
10.1109/ICCVW60793.2023.00127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of 1.25x with a minimal accuracy drop of 1.1% for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques. The code is available at https://github.com/Deeplite/activ-sparse.
引用
收藏
页码:1171 / 1180
页数:10
相关论文
共 51 条
  • [31] MMC-STATCOM supplementary wide-band damping control to mitigate subsynchronous control interaction in wind farms
    Liu, Yiqi
    Zheng, Junyuan
    Chen, Qichao
    Duan, Zhaoyu
    Tian, Yuhong
    Ban, Mingfei
    Li, Zhenjie
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 141
  • [32] Learning Efficient Convolutional Networks through Network Slimming
    Liu, Zhuang
    Li, Jianguo
    Shen, Zhiqiang
    Huang, Gao
    Yan, Shoumeng
    Zhang, Changshui
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2755 - 2763
  • [33] ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
    Luo, Jian-Hao
    Wu, Jianxin
    Lin, Weiyao
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5068 - 5076
  • [34] Ma N., 2018, P EUR C COMP VIS ECC, P116, DOI [DOI 10.1007/978-3-030-01264-9_8, 10.1007/978-3-030-01264-9_8]
  • [35] Ming Liu, 2020, Computer Vision - ECCV 2020 Workshops. Proceedings. Lecture Notes in Computer Science (LNCS 12538), P131, DOI 10.1007/978-3-030-66823-5_8
  • [36] Mishra A., 2017, CoRR
  • [37] Molchanov D., 2017, ICML, P2498
  • [38] Importance Estimation for Neural Network Pruning
    Molchanov, Pavlo
    Mallya, Arun
    Tyree, Stephen
    Frosio, Iuri
    Kautz, Jan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11256 - 11264
  • [39] Molchanov Pavlo, 2016, CoRR
  • [40] Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs
    Oh, Chanyoung
    So, Junhyuk
    Kim, Sumin
    Yi, Youngmin
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)