An adaptive joint optimization framework for pruning and quantization

被引:1
作者
Li, Xiaohai [1 ,2 ,3 ]
Yang, Xiaodong [1 ,2 ,3 ]
Zhang, Yingwei [1 ,2 ,3 ]
Yang, Jianrong [4 ,5 ]
Chen, Yiqiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Beijing Key Lab Mobile Comp & Pervas Device, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Guangxi Acad Med Sci, Peoples Hosp Guangxi Zhuang Autonomous Reg, Dept Hepatobiliary Pancreas & Spleen Surg, Nanning, Peoples R China
[5] Peoples Hosp Guangxi Zhuang Autonomous Reg, Guangxi Clin Res Ctr Sleep Med, Nanning, Peoples R China
基金
中国国家自然科学基金;
关键词
Model compression; Network pruning; Quantization; Mutual learning; Multi-teacher knowledge distillation;
D O I
10.1007/s13042-024-02229-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pruning and quantization are among the most widely used techniques for deep learning model compression. Their combined application holds the potential for even greater performance gains. Most existing works combine pruning and quantization sequentially. However, this separation makes it difficult to fully leverage their complementarity and exploit the potential benefits of joint optimization. To address the limitations of existing methods, we propose A-JOPQ (adaptive joint optimization of pruning and quantization), an adaptive joint optimization framework for pruning and quantization. Starting with a deep neural network, A-JOPQ first constructs a pruning network through adaptive mutual learning with a quantization network. This process compensates for the loss of structural information during pruning. Subsequently, the pruning network is incrementally quantized using adaptive multi-teacher knowledge distillation of itself and the original uncompressed model. This approach effectively mitigates the adverse effects of quantization. Finally, A-JOPQ generates a pruning-quantization network that achieves significant model compression while maintaining high accuracy. Extensive experiments conducted on several public datasets demonstrate the superiority of our proposed method. Compared to existing methods, A-JOPQ achieves higher accuracy with a smaller model size. Additionally, we extend A-JOPQ to federated learning (FL) settings. Simulation experiments show that A-JOPQ can enhance FL by enabling resource-limited clients to participate effectively.
引用
收藏
页码:5199 / 5215
页数:17
相关论文
共 50 条
[1]   TraNNsformer: Clustered Pruning on Crossbar-Based Architectures for Energy-Efficient Neural Networks [J].
Ankit, Aayush ;
Ibrayev, Timur ;
Sengupta, Abhronil ;
Roy, Kaushik .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (10) :2361-2374
[2]  
Blalock D., 2020, PROC MACH LEARN SYST, V2, P129, DOI DOI 10.48550/ARXIV.2003.03033
[3]  
Bulat A, 2021, 9 INT C LEARN REPR I
[4]   ZeroQ: A Novel Zero Shot Quantization Framework [J].
Cai, Yaohui ;
Yao, Zhewei ;
Dong, Zhen ;
Gholami, Amir ;
Mahoney, Michael W. ;
Keutzer, Kurt .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175
[5]  
Chao S., 2020, ADV NEURAL INFORM PR
[6]  
Courbariaux M, 2015, ADV NEUR IN, V28
[7]  
Dai B, 2018, PR MACH LEARN RES, V80
[8]  
Darabi S., 2018, arXiv
[9]   Post-training Piecewise Linear Quantization for Deep Neural Networks [J].
Fang, Jun ;
Shafiee, Ali ;
Abdel-Aziz, Hamzah ;
Thorsley, David ;
Georgiadis, Georgios ;
Hassoun, Joseph H. .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :69-86
[10]  
Gholami Amir, 2022, Low-Power Computer Vision, P291, DOI DOI 10.1201/9781003162810-13