Skill-Transferring Knowledge Distillation Method

被引:13
作者
Yang, Shunzhi [1 ,2 ]
Xu, Liuchi [2 ]
Zhou, Mengchu [3 ,4 ]
Yang, Xiong [1 ,5 ]
Yang, Jinfeng [1 ,5 ]
Huang, Zhenhua [2 ]
机构
[1] Shenzhen Polytech, Inst Appl Artificial Intelligence Guangdong Hong K, Shenzhen 518055, Guangdong, Peoples R China
[2] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China
[3] Zhejiang Gongshang Univ, Sch Informat & Elect Engn, Hangzhou 314423, Peoples R China
[4] New Jersey Inst Technol, Dept ECE, Newark, NJ 07102 USA
[5] Shenzhen Polytech, Inst Appl Artificial Intelligence, Guangdong Hong Kong Macao Greater Bay Area, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation; human teaching experience; knowledge and skills; object recognition; edge computing devices; machine learning; meta-learning; REPRESENTATION; RECOGNITION;
D O I
10.1109/TCSVT.2023.3271124
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Knowledge distillation is a deep learning method that mimics the way that humans teach, i.e., a teacher network is used to guide the training of a student one. Knowledge distillation can generate an efficient student network to facilitate deployment in resource-constrained edge computing devices. Existing studies have typically mined knowledge from a teacher network and transferred it to a student one. The latter can only passively receive knowledge but cannot understand how the former acquires the knowledge, thus limiting the latter's performance improvement. Inspired by the old Chinese saying "Give a man a fish and you feed him for a day; teach a man how to fish and you feed him for a lifetime," this work proposes a Skill-transferring Knowledge Distillation (SKD) method to boost a student network's ability to create new valuable knowledge. SKD consists of two main meta-learning networks: Teacher Behavior Teaching and Teacher Experience Teaching. The former captures the process of a teacher network's learning behavior in the hidden layers and can predict the teacher network's subsequent behavior based on previous ones. The latter models the optimal empirical knowledge of a teacher network's output layer at each learning stage. With their help, a teacher network can provide its actions to a student one in the subsequent behavior and its optimal empirical knowledge in the current stage. SKD's performance is verified through its application to multiple object recognition tasks and comparison with the state of the art.
引用
收藏
页码:6487 / 6502
页数:16
相关论文
共 87 条
  • [1] Variational Information Distillation for Knowledge Transfer
    Ahn, Sungsoo
    Hu, Shell Xu
    Damianou, Andreas
    Lawrence, Neil D.
    Dai, Zhenwen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
  • [2] A Lightweight Block With Information Flow Enhancement for Convolutional Neural Networks
    Bao, Zhiqiang
    Yang, Shunzhi
    Huang, Zhenhua
    Zhou, MengChu
    Chen, Yunwen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3570 - 3584
  • [3] Bengio Y, 2015, ICLR, P1, DOI DOI 10.48550/ARXIV.1412.6550
  • [4] A Hybrid Prediction Method for Realistic Network Traffic With Temporal Convolutional Network and LSTM
    Bi, Jing
    Zhang, Xiang
    Yuan, Haitao
    Zhang, Jia
    Zhou, MengChu
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (03) : 1869 - 1879
  • [5] The implementation of critical thinking development strategies into technology education: The evidence from Slovakia
    Brecka, Peter
    Valentova, Monika
    Lancaric, Drahoslav
    [J]. TEACHING AND TEACHER EDUCATION, 2022, 109
  • [6] PROSPECTIVE PRE-SCHOOL TEACHERS' VIEWS ON STEM LEARNING IN GRADE 9: THE PRINCIPLE OF CONTINUITY IN EDUCATION
    Cedere, Dagnija
    Jurgena, Inese
    Birzina, Rita
    Kalnina, Sandra
    [J]. PROBLEMS OF EDUCATION IN THE 21ST CENTURY, 2022, 80 (01) : 69 - 81
  • [7] Chen L., 2021, P IEEE CVF C COMP VI
  • [8] Chen YT, 2018, AAAI CONF ARTIF INTE, P2852
  • [9] Visual Relationship Detection: A Survey
    Cheng, Jun
    Wang, Lei
    Wu, Jiaji
    Hu, Xiping
    Jeon, Gwanggil
    Tao, Dacheng
    Zhou, Mengchu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8453 - 8466
  • [10] Relation Distillation Networks for Video Object Detection
    Deng, Jiajun
    Pan, Yingwei
    Yao, Ting
    Zhou, Wengang
    Li, Houqiang
    Mei, Tao
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7022 - 7031