Learning Student Networks via Feature Embedding

被引:53
作者
Chen, Hanting [1 ]
Wang, Yunhe [2 ]
Xu, Chang [3 ]
Xu, Chao [1 ]
Tao, Dacheng [3 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci EECS, Cooperat Medianet Innovat Ctr, Key Lab Machine Percept,Minist Educ, Beijing 100871, Peoples R China
[2] Huawei Technol Co Ltd, Noahs Ark Lab, Beijing 100085, Peoples R China
[3] Univ Sydney, Fac Engn, Sch Comp Sci, Darlington, NSW 2008, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Knowledge engineering; Convolution; Training; Learning systems; Computational complexity; Graphics processing units; Mobile handsets; Deep learning; knowledge distillation (KD); teacher-student learning;
D O I
10.1109/TNNLS.2020.2970494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student-based methods used to rely on additional fully connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this article aims to propagate information from teacher to student without introducing new variables that need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features that could inherit intrinsic properties of their corresponding high-dimensional features from the teacher network. The resulting portable network, thus, can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark data sets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 42 条
  • [1] [Anonymous], P 3 INT C LEARNING R
  • [2] [Anonymous], 2017, ARXIV170404861
  • [3] Ba LJ, 2014, ADV NEUR IN, V27
  • [4] Belkin M, 2002, ADV NEUR IN, V14, P585
  • [5] Chen WL, 2015, PR MACH LEARN RES, V37, P2285
  • [6] Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks
    Cheng, Jian
    Wu, Jiaxiang
    Leng, Cong
    Wang, Yuhang
    Hu, Qinghao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4730 - 4743
  • [7] Denton E, 2014, ADV NEUR IN, V27
  • [8] Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks
    Gong, Maoguo
    Zhao, Jiaojiao
    Liu, Jia
    Miao, Qiguang
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (01) : 125 - 138
  • [9] Gong Y., 2014, ARXIV14126115
  • [10] Goodfellow I. J., 2013, ICML