Collaborative Multiple-Student Single-Teacher for Online Learning

被引：0

作者：

Zain, Alaa ^{[1
]}

Jian, Yang ^{[1
]}

Zhou, Jinjia ^{[1
]}

机构：

[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I | 2022年 / 13529卷

关键词：

Knowledge distillation; Deep convolutional model compression; Transfer learning; Label smoothing; KNOWLEDGE DISTILLATION;

D O I：

10.1007/978-3-031-15919-0_43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation is a popular method where a large trained network (teacher) is implemented to train a smaller network (student). To decrease the need for training a much larger network (teacher) for real time application, one student self-knowledge distillation was introduced as a solid technique for compressing neural networks specially for real time applications. However, most of the existing methods consider only one type of knowledge and apply one-student one-teacher learning strategy. This paper presents a collaborative multiple-student single-teacher system (CMSST). The proposed approach is based on real time applications that contain temporal information, which play an important role in understanding. We designed a backbone old student network with target complexity for deployment, during training, once the old student provides high-quality soft labels to guide the hierarchical new student, it also offers the opportunity for the new student to make meaningful improvements based on the students' revised feedback via the shared intermediate representations. Moreover, we introduced soft target label smoothing technique to the CMSST. Experimental results showed that the accuracy can be improved on newly developed teacher knowledge distillation by 1.5% on the UCF-101. Also the accuracy was improved by 1.15% compared to normal huge teacher knowledge distillation on CIFAR100 dataset.

引用

页码：515 / 525

页数：11

共 20 条

[1] Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey
Berthelier, Anthony
Chateau, Thierry
Duffner, Stefan
Garcia, Christophe
Blanc, Christophe
[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (08): : 863 - 878
[2] Distill on the Go: Online knowledge distillation in self-supervised learning
Bhat, Prashant
Arani, Elahe
Zonooz, Bahram
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2672 - 2681
[3] On the Efficacy of Knowledge Distillation
Cho, Jang Hyun
Hariharan, Bharath
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4793 - 4801
[4] Dubey A, 2018, 32 C NEURAL INFORM P, V31
[5] Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition
Duc-Quang Vu
Le, Ngan
Wang, Jia-Ching
[J]. IEEE ACCESS, 2021, 9 : 105711 - 105723
[6] Furlanello T, 2018, PR MACH LEARN RES, V80
[7] Knowledge Distillation: A Survey
Gou, Jianping
Yu, Baosheng
Maybank, Stephen J.
Tao, Dacheng
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
[8] Haarnoja T, 2018, PR MACH LEARN RES, V80
[9] Video Representation Learning by Dense Predictive Coding
Han, Tengda
Xie, Weidi
Zisserman, Andrew
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1483 - 1492
[10] Hinton Geoffrey, 2015, DISTILLING KNOWLEDGE

← 1 2 →