Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks

被引:0
作者
Motetti, Beatrice Alessandra [1 ]
Risso, Matteo [2 ]
Burrello, Alessio [3 ]
Macii, Enrico [4 ]
Poncino, Massimo [4 ]
Pagliari, Daniele Jahier [4 ]
机构
[1] Politecn Torino, Data Sci & Engn, I-10129 Turin, Italy
[2] Politecn Torino, Elect Engn, I-10129 Turin, Italy
[3] Politecn Torino, I-10129 Turin, Italy
[4] Politecn Torino, Comp Engn, I-10129 Turin, Italy
关键词
Deep learning; edge computing; quantization; pruning; neural architecture search;
D O I
10.1109/TC.2024.3449084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.
引用
收藏
页码:2619 / 2633
页数:15
相关论文
共 45 条
[11]  
Elthakeb AT, 2020, IEEE MICRO, V40, P37, DOI [10.1109/mm.2020.3009475, 10.1109/MM.2020.3009475]
[12]  
Frantar E, 2023, Arxiv, DOI [arXiv:2210.17323, 10.48550/arxiv.2210.17323]
[13]   Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning [J].
Gong, Chengyue ;
Jiang, Zixuan ;
Wang, Dilin ;
Lin, Yibo ;
Liu, Qiang ;
Pan, David Z. .
2019 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2019,
[14]   MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks [J].
Gordon, Ariel ;
Eban, Elad ;
Nachum, Ofir ;
Chen, Bo ;
Wu, Hao ;
Yang, Tien-Ju ;
Choi, Edward .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1586-1595
[15]  
Haibao Yu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P1, DOI 10.1007/978-3-030-58545-7_1
[16]  
Han S., 2016, 4 INT C LEARN REPRES
[17]  
Han S, 2015, ADV NEUR IN, V28
[18]  
Hoefler T, 2021, J MACH LEARN RES, V23
[19]   Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [J].
Jacob, Benoit ;
Kligys, Skirmantas ;
Chen, Bo ;
Zhu, Menglong ;
Tang, Matthew ;
Howard, Andrew ;
Adam, Hartwig ;
Kalenichenko, Dmitry .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2704-2713
[20]  
Li H., 2016, INT C LEARNING REPRE