Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks

被引：0

作者：

Motetti, Beatrice Alessandra ^{[1
]}

Risso, Matteo ^{[2
]}

Burrello, Alessio ^{[3
]}

Macii, Enrico ^{[4
]}

Poncino, Massimo ^{[4
]}

Pagliari, Daniele Jahier ^{[4
]}

机构：

[1] Politecn Torino, Data Sci & Engn, I-10129 Turin, Italy

[2] Politecn Torino, Elect Engn, I-10129 Turin, Italy

[3] Politecn Torino, I-10129 Turin, Italy

[4] Politecn Torino, Comp Engn, I-10129 Turin, Italy

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 11期

关键词：

Deep learning; edge computing; quantization; pruning; neural architecture search;

D O I：

10.1109/TC.2024.3449084

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.

引用

页码：2619 / 2633

页数：15

共 45 条

[1]

Banbury C., 2021, P NEURAL INFORM PROC

[2]

Benmeziane H, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P4322

[3] Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [J].

Cai, Han ;

Lin, Ji ;

Lin, Yujun ;

Liu, Zhijian ;

Tang, Haotian ;

Wang, Hanrui ;

Zhu, Ligeng ;

Han, Song .

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2022, 27 (03)

[4] Rethinking Differentiable Search for Mixed-Precision Neural Networks [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2346-2355

[5]

Chitty-Venkata K. T., 2023, IEEE Access, V11

[6] Neural Architecture Search Survey: A Hardware Perspective [J].

Chitty-Venkata, Krishna Teja ;

Somani, Arun K. .

ACM COMPUTING SURVEYS, 2023, 55 (04)

[7] Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures [J].

Chitty-Venkata, Krishna Teja ;

Emani, Murali ;

Vishwanath, Venkatram ;

Somani, Arun K. .

PROCEEDINGS OF THE 31ST INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2022, 2022, :265-276

[8]

Choi J, 2018, Arxiv, DOI arXiv:1805.06085

[9]

DAI S., 2021, P MACHINE LEARNING S, P873

[10]

Dong Zhen, 2020, ADV NEUR IN, V33

← 1 2 3 4 5 →