Skill-Transferring Knowledge Distillation Method

被引：13

作者：

Yang, Shunzhi ^{[1
,2
]}

Xu, Liuchi ^{[2
]}

Zhou, Mengchu ^{[3
,4
]}

Yang, Xiong ^{[1
,5
]}

Yang, Jinfeng ^{[1
,5
]}

Huang, Zhenhua ^{[2
]}

机构：

[1] Shenzhen Polytech, Inst Appl Artificial Intelligence Guangdong Hong K, Shenzhen 518055, Guangdong, Peoples R China

[2] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China

[3] Zhejiang Gongshang Univ, Sch Informat & Elect Engn, Hangzhou 314423, Peoples R China

[4] New Jersey Inst Technol, Dept ECE, Newark, NJ 07102 USA

[5] Shenzhen Polytech, Inst Appl Artificial Intelligence, Guangdong Hong Kong Macao Greater Bay Area, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Knowledge distillation; human teaching experience; knowledge and skills; object recognition; edge computing devices; machine learning; meta-learning; REPRESENTATION; RECOGNITION;

D O I：

10.1109/TCSVT.2023.3271124

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Knowledge distillation is a deep learning method that mimics the way that humans teach, i.e., a teacher network is used to guide the training of a student one. Knowledge distillation can generate an efficient student network to facilitate deployment in resource-constrained edge computing devices. Existing studies have typically mined knowledge from a teacher network and transferred it to a student one. The latter can only passively receive knowledge but cannot understand how the former acquires the knowledge, thus limiting the latter's performance improvement. Inspired by the old Chinese saying "Give a man a fish and you feed him for a day; teach a man how to fish and you feed him for a lifetime," this work proposes a Skill-transferring Knowledge Distillation (SKD) method to boost a student network's ability to create new valuable knowledge. SKD consists of two main meta-learning networks: Teacher Behavior Teaching and Teacher Experience Teaching. The former captures the process of a teacher network's learning behavior in the hidden layers and can predict the teacher network's subsequent behavior based on previous ones. The latter models the optimal empirical knowledge of a teacher network's output layer at each learning stage. With their help, a teacher network can provide its actions to a student one in the subsequent behavior and its optimal empirical knowledge in the current stage. SKD's performance is verified through its application to multiple object recognition tasks and comparison with the state of the art.

引用

页码：6487 / 6502

页数：16

共 87 条

[1] Variational Information Distillation for Knowledge Transfer
Ahn, Sungsoo
Hu, Shell Xu
Damianou, Andreas
Lawrence, Neil D.
Dai, Zhenwen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
[2] A Lightweight Block With Information Flow Enhancement for Convolutional Neural Networks
Bao, Zhiqiang
Yang, Shunzhi
Huang, Zhenhua
Zhou, MengChu
Chen, Yunwen
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3570 - 3584
[3] Bengio Y, 2015, ICLR, P1, DOI DOI 10.48550/ARXIV.1412.6550
[4] A Hybrid Prediction Method for Realistic Network Traffic With Temporal Convolutional Network and LSTM
Bi, Jing
Zhang, Xiang
Yuan, Haitao
Zhang, Jia
Zhou, MengChu
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (03) : 1869 - 1879
[5] The implementation of critical thinking development strategies into technology education: The evidence from Slovakia
Brecka, Peter
Valentova, Monika
Lancaric, Drahoslav
[J]. TEACHING AND TEACHER EDUCATION, 2022, 109
[6] PROSPECTIVE PRE-SCHOOL TEACHERS' VIEWS ON STEM LEARNING IN GRADE 9: THE PRINCIPLE OF CONTINUITY IN EDUCATION
Cedere, Dagnija
Jurgena, Inese
Birzina, Rita
Kalnina, Sandra
[J]. PROBLEMS OF EDUCATION IN THE 21ST CENTURY, 2022, 80 (01) : 69 - 81
[7] Chen L., 2021, P IEEE CVF C COMP VI
[8] Chen YT, 2018, AAAI CONF ARTIF INTE, P2852
[9] Visual Relationship Detection: A Survey
Cheng, Jun
Wang, Lei
Wu, Jiaji
Hu, Xiping
Jeon, Gwanggil
Tao, Dacheng
Zhou, Mengchu
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8453 - 8466
[10] Relation Distillation Networks for Video Object Detection
Deng, Jiajun
Pan, Yingwei
Yao, Ting
Zhou, Wengang
Li, Houqiang
Mei, Tao
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7022 - 7031

← 1 2 3 4 5 6 7 8 9 →