PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

被引：9

作者：

Kim, Jangho ^{[1
,2
]}

Chang, Simyung ^{[1
]}

Kwak, Nojun ^{[2
]}

机构：

[1] Qualcomm Korea YH, Qualcomm AI Res, Seoul, South Korea

[2] Seoul Natl Univ, Seoul, South Korea

来源：

INTERSPEECH 2021 | 2021年

基金：

新加坡国家研究基金会;

关键词：

keyword spotting; model pruning; model quantization; knowledge distillation;

D O I：

10.21437/Interspeech.2021-248

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network (See Fig. 1). We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.

引用

页码：4568 / 4572

页数：5

共 50 条

[21] Using Distillation to Improve Network Performance after Pruning and Quantization
Bao, Zhenshan
Liu, Jiayang
Zhang, Wenbo
PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND MACHINE INTELLIGENCE (MLMI 2019), 2019, : 3 - 6
[22] A hybrid model compression approach via knowledge distillation for predicting energy consumption in additive manufacturing
Li, Yixin
Hu, Fu
Liu, Ying
Ryan, Michael
Wang, Ray
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2023, 61 (13) : 4525 - 4547
[23] Model Compression Based on Knowledge Distillation and Its Application in HRRP
Chen, Xiaojiao
An, Zhenyu
Huang, Liansheng
He, Shiying
Wang, Zhen
PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1268 - 1272
[24] Vision Transformer Quantization with Multi-Step Knowledge Distillation
Ranjan, Navin
Savakis, Andreas
SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXIII, 2024, 13057
[25] Pruning-and-distillation: One-stage joint compression framework for CNNs via clustering
Niu, Tao
Teng, Yinglei
Jin, Lei
Zou, Panpan
Liu, Yiding
IMAGE AND VISION COMPUTING, 2023, 136
[26] Uncertainty-Driven Knowledge Distillation for Language Model Compression
Huang, Tianyu
Dong, Weisheng
Wu, Fangfang
Li, Xin
Shi, Guangming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2850 - 2858
[27] Learning Slimming SSD through Pruning and Knowledge Distillation
Li, Zhishan
Xu, Xiaozhou
Xie, Lei
Su, Hongye
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2701 - 2705
[28] Automatic detection of structural defects in tunnel lining via network pruning and knowledge distillation in YOLO
Liu, Ruilin
Zeng, Wei
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024,
[29] A neural network compression method based on knowledge-distillation and parameter quantization for the bearing fault diagnosis
Ji, Mengyu
Peng, Gaoliang
Li, Sijue
Cheng, Feng
Chen, Zhao
Li, Zhixiong
Du, Haiping
APPLIED SOFT COMPUTING, 2022, 127
[30] Efficient Neural Data Compression for Machine Type Communications via Knowledge Distillation
Hussien, Mostafa
Xu, Yi Tian
Wu, Di
Liu, Xue
Dudek, Gregory
2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1169 - 1174

← 1 2 3 4 5 →