PrUE: Distilling Knowledge from Sparse Teacher Networks

被引：0

作者：

Wang, Shaopu ^{[1
,2
]}

Chen, Xiaojun ^{[2
]}

Kou, Mengzhen ^{[1
,2
]}

Shi, Jinqiao ^{[3
]}

机构：

[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷

关键词：

Knowledge distillation; Network pruning; Deep learning; DISTILLATION;

D O I：

10.1007/978-3-031-26409-2_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer knowledge from a cumbersome (teacher) network into a lightweight (student) network. However, guidance from a teacher does not always improve the generalization of students, especially when the size gap between student and teacher is large. Previous works argued that it was due to the high certainty of the teacher, resulting in harder labels that were difficult to fit. To soften these labels, we present a pruning method termed Prediction Uncertainty Enlargement (PrUE) to simplify the teacher. Specifically, our method aims to decrease the teacher's certainty about data, thereby generating soft predictions for students. We empirically investigate the effectiveness of the proposed method with experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet. Results indicate that student networks trained with sparse teachers achieve better performance. Besides, our method allows researchers to distill knowledge from deeper networks to improve students further. Our code is made public at: https://github. com/wangshaopu/prue.

引用

页码：102 / 117

页数：16

共 39 条

[1] Combining Weight Pruning and Knowledge Distillation For CNN Compression
Aghli, Nima
Ribeiro, Eraldo
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3185 - 3192
[2] Anil R., 2018, INT C LEARNING REPRE
[3] Brown TB, 2020, ADV NEUR IN, V33
[4] Carbin M., 2021, INT C LEARNING REPRE
[5] Learning Slimming SAR Ship Object Detector Through Network Pruning and Knowledge Distillation
Chen, Shiqi
Zhan, Ronghui
Wang, Wei
Zhang, Jun
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 1267 - 1282
[6] Recent advances in efficient computation of deep convolutional neural networks
Cheng, Jian
Wang, Pei-song
Li, Gang
Hu, Qing-hao
Lu, Han-qing
[J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (01) : 64 - 77
[7] Joint structured pruning and dense knowledge distillation for efficient transformer model compression
Cui, Baiyun
Li, Yingming
Zhang, Zhongfei
[J]. NEUROCOMPUTING, 2021, 458 : 56 - 69
[8] Dean J., 2015, ARXIV PREPRINT ARXIV
[9] Denil M., 2013, Advances in Neural Information Processing Systems, V26, P543, DOI 10.48550/arXiv.1306.0543
[10] Five-repetition sit-to-Stand test among patients post-stroke and healthy-matched controls: the use of different chair types and number of trials
Franco, Juliane
Quintino, Ludmylla Ferreira
Faria, Christina D. C. M.
[J]. PHYSIOTHERAPY THEORY AND PRACTICE, 2021, 37 (12) : 1419 - 1428

← 1 2 3 4 →