Collaborative Multiple-Student Single-Teacher for Online Learning

被引:0
作者
Zain, Alaa [1 ]
Jian, Yang [1 ]
Zhou, Jinjia [1 ]
机构
[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo, Japan
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I | 2022年 / 13529卷
关键词
Knowledge distillation; Deep convolutional model compression; Transfer learning; Label smoothing; KNOWLEDGE DISTILLATION;
D O I
10.1007/978-3-031-15919-0_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation is a popular method where a large trained network (teacher) is implemented to train a smaller network (student). To decrease the need for training a much larger network (teacher) for real time application, one student self-knowledge distillation was introduced as a solid technique for compressing neural networks specially for real time applications. However, most of the existing methods consider only one type of knowledge and apply one-student one-teacher learning strategy. This paper presents a collaborative multiple-student single-teacher system (CMSST). The proposed approach is based on real time applications that contain temporal information, which play an important role in understanding. We designed a backbone old student network with target complexity for deployment, during training, once the old student provides high-quality soft labels to guide the hierarchical new student, it also offers the opportunity for the new student to make meaningful improvements based on the students' revised feedback via the shared intermediate representations. Moreover, we introduced soft target label smoothing technique to the CMSST. Experimental results showed that the accuracy can be improved on newly developed teacher knowledge distillation by 1.5% on the UCF-101. Also the accuracy was improved by 1.15% compared to normal huge teacher knowledge distillation on CIFAR100 dataset.
引用
收藏
页码:515 / 525
页数:11
相关论文
共 20 条
  • [1] Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey
    Berthelier, Anthony
    Chateau, Thierry
    Duffner, Stefan
    Garcia, Christophe
    Blanc, Christophe
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (08): : 863 - 878
  • [2] Distill on the Go: Online knowledge distillation in self-supervised learning
    Bhat, Prashant
    Arani, Elahe
    Zonooz, Bahram
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2672 - 2681
  • [3] On the Efficacy of Knowledge Distillation
    Cho, Jang Hyun
    Hariharan, Bharath
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4793 - 4801
  • [4] Dubey A, 2018, 32 C NEURAL INFORM P, V31
  • [5] Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition
    Duc-Quang Vu
    Le, Ngan
    Wang, Jia-Ching
    [J]. IEEE ACCESS, 2021, 9 : 105711 - 105723
  • [6] Furlanello T, 2018, PR MACH LEARN RES, V80
  • [7] Knowledge Distillation: A Survey
    Gou, Jianping
    Yu, Baosheng
    Maybank, Stephen J.
    Tao, Dacheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
  • [8] Haarnoja T, 2018, PR MACH LEARN RES, V80
  • [9] Video Representation Learning by Dense Predictive Coding
    Han, Tengda
    Xie, Weidi
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1483 - 1492
  • [10] Hinton Geoffrey, 2015, DISTILLING KNOWLEDGE