Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks

被引：0

作者：

Motetti, Beatrice Alessandra ^{[1
]}

Risso, Matteo ^{[2
]}

Burrello, Alessio ^{[3
]}

Macii, Enrico ^{[4
]}

Poncino, Massimo ^{[4
]}

Pagliari, Daniele Jahier ^{[4
]}

机构：

[1] Politecn Torino, Data Sci & Engn, I-10129 Turin, Italy

[2] Politecn Torino, Elect Engn, I-10129 Turin, Italy

[3] Politecn Torino, I-10129 Turin, Italy

[4] Politecn Torino, Comp Engn, I-10129 Turin, Italy

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 11期

关键词：

Deep learning; edge computing; quantization; pruning; neural architecture search;

D O I：

10.1109/TC.2024.3449084

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.

引用

页码：2619 / 2633

页数：15

共 45 条

[11]

Elthakeb AT, 2020, IEEE MICRO, V40, P37, DOI [10.1109/mm.2020.3009475, 10.1109/MM.2020.3009475]

[12]

Frantar E, 2023, Arxiv, DOI [arXiv:2210.17323, 10.48550/arxiv.2210.17323]

[13] Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning [J].

Gong, Chengyue ;

Jiang, Zixuan ;

Wang, Dilin ;

Lin, Yibo ;

Liu, Qiang ;

Pan, David Z. .

2019 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2019,

[14] MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks [J].

Gordon, Ariel ;

Eban, Elad ;

Nachum, Ofir ;

Chen, Bo ;

Wu, Hao ;

Yang, Tien-Ju ;

Choi, Edward .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1586-1595

[15]

Haibao Yu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P1, DOI 10.1007/978-3-030-58545-7_1

[16]

Han S., 2016, 4 INT C LEARN REPRES

[17]

Han S, 2015, ADV NEUR IN, V28

[18]

Hoefler T, 2021, J MACH LEARN RES, V23

[19] Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [J].

Jacob, Benoit ;

Kligys, Skirmantas ;

Chen, Bo ;

Zhu, Menglong ;

Tang, Matthew ;

Howard, Andrew ;

Adam, Hartwig ;

Kalenichenko, Dmitry .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2704-2713

[20]

Li H., 2016, INT C LEARNING REPRE

← 1 2 3 4 5 →