Skill-Transferring Knowledge Distillation Method

被引:13
作者
Yang, Shunzhi [1 ,2 ]
Xu, Liuchi [2 ]
Zhou, Mengchu [3 ,4 ]
Yang, Xiong [1 ,5 ]
Yang, Jinfeng [1 ,5 ]
Huang, Zhenhua [2 ]
机构
[1] Shenzhen Polytech, Inst Appl Artificial Intelligence Guangdong Hong K, Shenzhen 518055, Guangdong, Peoples R China
[2] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China
[3] Zhejiang Gongshang Univ, Sch Informat & Elect Engn, Hangzhou 314423, Peoples R China
[4] New Jersey Inst Technol, Dept ECE, Newark, NJ 07102 USA
[5] Shenzhen Polytech, Inst Appl Artificial Intelligence, Guangdong Hong Kong Macao Greater Bay Area, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; human teaching experience; knowledge and skills; object recognition; edge computing devices; machine learning; meta-learning; REPRESENTATION; RECOGNITION;
D O I
10.1109/TCSVT.2023.3271124
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Knowledge distillation is a deep learning method that mimics the way that humans teach, i.e., a teacher network is used to guide the training of a student one. Knowledge distillation can generate an efficient student network to facilitate deployment in resource-constrained edge computing devices. Existing studies have typically mined knowledge from a teacher network and transferred it to a student one. The latter can only passively receive knowledge but cannot understand how the former acquires the knowledge, thus limiting the latter's performance improvement. Inspired by the old Chinese saying "Give a man a fish and you feed him for a day; teach a man how to fish and you feed him for a lifetime," this work proposes a Skill-transferring Knowledge Distillation (SKD) method to boost a student network's ability to create new valuable knowledge. SKD consists of two main meta-learning networks: Teacher Behavior Teaching and Teacher Experience Teaching. The former captures the process of a teacher network's learning behavior in the hidden layers and can predict the teacher network's subsequent behavior based on previous ones. The latter models the optimal empirical knowledge of a teacher network's output layer at each learning stage. With their help, a teacher network can provide its actions to a student one in the subsequent behavior and its optimal empirical knowledge in the current stage. SKD's performance is verified through its application to multiple object recognition tasks and comparison with the state of the art.
引用
收藏
页码:6487 / 6502
页数:16
相关论文
共 87 条
  • [41] Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild
    Li, Shan
    Deng, Weihong
    Du, JunPing
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2584 - 2593
  • [42] Lin S, 2022, CVPR, P10915
  • [43] Liu J., 2022, arXiv
  • [44] Deep Learning in Sheet Metal Bending With a Novel Theory-Guided Deep Neural Network
    Liu, Shiming
    Xia, Yifan
    Shi, Zhusheng
    Yu, Hui
    Li, Zhiqiang
    Lin, Jianguo
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (03) : 565 - 581
  • [45] Liu Y., 2021, arXiv
  • [46] Liu Y, 2019, PROC CVPR IEEE, P3599, DOI [10.1109/CVPR.2019.00372, 10.1109/CVPR.2019.00726]
  • [47] Lucey P., 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, P94
  • [48] Manandhar N K., 2022, Contemporary Mathematics and Science Education, V3, P1, DOI [10.30935/conmaths/11723, DOI 10.30935/CONMATHS/11723]
  • [49] Cross-resolution learning for Face Recognition
    Massoli, Fabio Valerio
    Amato, Giuseppe
    Falchi, Fabrizio
    [J]. IMAGE AND VISION COMPUTING, 2020, 99
  • [50] Menon AK, 2021, INT C MACHINE LEARNI, V139