Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks

被引:0
作者
Motetti, Beatrice Alessandra [1 ]
Risso, Matteo [2 ]
Burrello, Alessio [3 ]
Macii, Enrico [4 ]
Poncino, Massimo [4 ]
Pagliari, Daniele Jahier [4 ]
机构
[1] Politecn Torino, Data Sci & Engn, I-10129 Turin, Italy
[2] Politecn Torino, Elect Engn, I-10129 Turin, Italy
[3] Politecn Torino, I-10129 Turin, Italy
[4] Politecn Torino, Comp Engn, I-10129 Turin, Italy
关键词
Deep learning; edge computing; quantization; pruning; neural architecture search;
D O I
10.1109/TC.2024.3449084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.
引用
收藏
页码:2619 / 2633
页数:15
相关论文
共 45 条
[1]  
Banbury C., 2021, P NEURAL INFORM PROC
[2]  
Benmeziane H, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P4322
[3]   Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [J].
Cai, Han ;
Lin, Ji ;
Lin, Yujun ;
Liu, Zhijian ;
Tang, Haotian ;
Wang, Hanrui ;
Zhu, Ligeng ;
Han, Song .
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2022, 27 (03)
[4]   Rethinking Differentiable Search for Mixed-Precision Neural Networks [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2346-2355
[5]  
Chitty-Venkata K. T., 2023, IEEE Access, V11
[6]   Neural Architecture Search Survey: A Hardware Perspective [J].
Chitty-Venkata, Krishna Teja ;
Somani, Arun K. .
ACM COMPUTING SURVEYS, 2023, 55 (04)
[7]   Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures [J].
Chitty-Venkata, Krishna Teja ;
Emani, Murali ;
Vishwanath, Venkatram ;
Somani, Arun K. .
PROCEEDINGS OF THE 31ST INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2022, 2022, :265-276
[8]  
Choi J, 2018, Arxiv, DOI arXiv:1805.06085
[9]  
DAI S., 2021, P MACHINE LEARNING S, P873
[10]  
Dong Zhen, 2020, ADV NEUR IN, V33