Learning Student Networks via Feature Embedding

被引：60

作者：

Chen, Hanting ^{[1
]}

Wang, Yunhe ^{[2
]}

Xu, Chang ^{[3
]}

Xu, Chao ^{[1
]}

Tao, Dacheng ^{[3
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci EECS, Cooperat Medianet Innovat Ctr, Key Lab Machine Percept,Minist Educ, Beijing 100871, Peoples R China

[2] Huawei Technol Co Ltd, Noahs Ark Lab, Beijing 100085, Peoples R China

[3] Univ Sydney, Fac Engn, Sch Comp Sci, Darlington, NSW 2008, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 01期

基金：

澳大利亚研究理事会; 中国国家自然科学基金;

关键词：

Knowledge engineering; Convolution; Training; Learning systems; Computational complexity; Graphics processing units; Mobile handsets; Deep learning; knowledge distillation (KD); teacher-student learning;

D O I：

10.1109/TNNLS.2020.2970494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student-based methods used to rely on additional fully connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this article aims to propagate information from teacher to student without introducing new variables that need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features that could inherit intrinsic properties of their corresponding high-dimensional features from the teacher network. The resulting portable network, thus, can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark data sets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity.

引用

页码：25 / 35

页数：11

共 42 条

[1]

Adriana R., 2015, P ICLR, V2, P1

[2]

[Anonymous], 2014, Comput. Sci.

[3]

[Anonymous], 2017, ARXIV171108141

[4]

[Anonymous], 2013, ICML

[5]

Ba LJ, 2014, ADV NEUR IN, V27

[6]

Belkin M, 2002, ADV NEUR IN, V14, P585

[7]

Chen WL, 2015, PR MACH LEARN RES, V37, P2285

[8] Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks [J].

Cheng, Jian ;

Wu, Jiaxiang ;

Leng, Cong ;

Wang, Yuhang ;

Hu, Qinghao .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :4730-4743

[9]

Denton E, 2014, ADV NEUR IN, V27

[10] Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks [J].

Gong, Maoguo ;

Zhao, Jiaojiao ;

Liu, Jia ;

Miao, Qiguang ;

Jiao, Licheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (01) :125-138

← 1 2 3 4 5 →