Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

被引:5
作者
Gong, Yifan [1 ]
Yuan, Geng [1 ]
Zhan, Zheng [1 ]
Niu, Wei [2 ]
Li, Zhengang [1 ]
Zhao, Pu [1 ]
Cai, Yuxuan [1 ]
Liu, Sijia [3 ]
Ren, Bin [2 ]
Lin, Xue [1 ]
Tang, Xulong [4 ]
Wang, Yanzhi [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] Coll William & Mary, Williamsburg, VA USA
[3] Michigan State Univ, E Lansing, MI 48824 USA
[4] Univ Pittsburgh, Pittsburgh, PA USA
基金
美国国家科学基金会;
关键词
Network pruning; mobile acceleration; neural architecture search;
D O I
10.1145/3495532
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods-one -search based and the other is rule based-are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48x and 1.73x DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.
引用
收藏
页数:26
相关论文
共 84 条
  • [1] Ashari A, 2015, ACM SIGPLAN NOTICES, V50, P173, DOI [10.1145/2858788.2688521, 10.1145/2688500.2688521]
  • [2] Julia: A Fresh Approach to Numerical Computing
    Bezanson, Jeff
    Edelman, Alan
    Karpinski, Stefan
    Shah, Viral B.
    [J]. SIAM REVIEW, 2017, 59 (01) : 65 - 98
  • [3] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
  • [4] Boehm M, 2018, Arxiv, DOI arXiv:1801.00829
  • [5] Cai H, 2019, Arxiv, DOI arXiv:1812.00332
  • [6] Cai YX, 2021, AAAI CONF ARTIF INTE, V35, P955
  • [7] Enhancing Sparsity by Reweighted l1 Minimization
    Candes, Emmanuel J.
    Wakin, Michael B.
    Boyd, Stephen P.
    [J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2008, 14 (5-6) : 877 - 905
  • [8] Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
  • [9] Chen YT, 2018, Arxiv, DOI arXiv:1812.06855
  • [10] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807