PrUE: Distilling Knowledge from Sparse Teacher Networks

被引:0
作者
Wang, Shaopu [1 ,2 ]
Chen, Xiaojun [2 ]
Kou, Mengzhen [1 ,2 ]
Shi, Jinqiao [3 ]
机构
[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷
关键词
Knowledge distillation; Network pruning; Deep learning; DISTILLATION;
D O I
10.1007/978-3-031-26409-2_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer knowledge from a cumbersome (teacher) network into a lightweight (student) network. However, guidance from a teacher does not always improve the generalization of students, especially when the size gap between student and teacher is large. Previous works argued that it was due to the high certainty of the teacher, resulting in harder labels that were difficult to fit. To soften these labels, we present a pruning method termed Prediction Uncertainty Enlargement (PrUE) to simplify the teacher. Specifically, our method aims to decrease the teacher's certainty about data, thereby generating soft predictions for students. We empirically investigate the effectiveness of the proposed method with experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet. Results indicate that student networks trained with sparse teachers achieve better performance. Besides, our method allows researchers to distill knowledge from deeper networks to improve students further. Our code is made public at: https://github. com/wangshaopu/prue.
引用
收藏
页码:102 / 117
页数:16
相关论文
共 39 条
  • [1] Combining Weight Pruning and Knowledge Distillation For CNN Compression
    Aghli, Nima
    Ribeiro, Eraldo
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3185 - 3192
  • [2] Anil R., 2018, INT C LEARNING REPRE
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] Carbin M., 2021, INT C LEARNING REPRE
  • [5] Learning Slimming SAR Ship Object Detector Through Network Pruning and Knowledge Distillation
    Chen, Shiqi
    Zhan, Ronghui
    Wang, Wei
    Zhang, Jun
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 1267 - 1282
  • [6] Recent advances in efficient computation of deep convolutional neural networks
    Cheng, Jian
    Wang, Pei-song
    Li, Gang
    Hu, Qing-hao
    Lu, Han-qing
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (01) : 64 - 77
  • [7] Joint structured pruning and dense knowledge distillation for efficient transformer model compression
    Cui, Baiyun
    Li, Yingming
    Zhang, Zhongfei
    [J]. NEUROCOMPUTING, 2021, 458 : 56 - 69
  • [8] Dean J., 2015, ARXIV PREPRINT ARXIV
  • [9] Denil M., 2013, Advances in Neural Information Processing Systems, V26, P543, DOI 10.48550/arXiv.1306.0543
  • [10] Five-repetition sit-to-Stand test among patients post-stroke and healthy-matched controls: the use of different chair types and number of trials
    Franco, Juliane
    Quintino, Ludmylla Ferreira
    Faria, Christina D. C. M.
    [J]. PHYSIOTHERAPY THEORY AND PRACTICE, 2021, 37 (12) : 1419 - 1428