Few Sample Knowledge Distillation for Efficient Network Compression

被引：98

作者：

Li, Tianhong ^{[1
,2
]}

Li, Jianguo ^{[2
]}

Liu, Zhuang ^{[3
]}

Zhang, Changshui ^{[4
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Intel Labs, Santa Clara, CA USA

[3] Univ Calif Berkeley, Berkeley, CA USA

[4] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

关键词：

D O I：

10.1109/CVPR42600.2020.01465

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the requirement of a large training set and the time-consuming training procedure. This paper proposes a novel solution for knowledge distillation from label-free few samples to realize both data efficiency and training/processing efficiency. We treat the original network as "teacher-net" and the compressed network as "student-net". A 1 x 1 convolution layer is added at the end of each layer block of the student-net, and we fit the block-level outputs of the student-net to the teacher-net by estimating the parameters of the added layers. We prove that the added layer can be merged without adding extra parameters and computation cost during inference. Experiments on multiple datasets and network architectures verify the method's effectiveness on student-nets obtained by various network pruning and weight decomposition methods. Our method can recover student-net's accuracy to the same level as conventional fine-tuning methods in minutes while using only 1% label-free data of the MI training data.

引用

页码：14627 / 14635

页数：9

共 38 条

[1]

[Anonymous], 2016, NIPS

[2]

Ba Jimmy, 2014, Advances in Neural Information Processing Systems, P2654

[3]

Bart Evgeniy, 2005, CVPR IEEE

[4]

Bhardwaj Kartikeya, 2019, ARXIV

[5]

Bucila Cristian, 2006, SIGKDD ACM

[6] Data-Free Learning of Student Networks [J].

Chen, Hanting ;

Wang, Yunhe ;

Xu, Chang ;

Yang, Zhaohui ;

Liu, Chuanjian ;

Shi, Boxin ;

Xu, Chunjing ;

Xu, Chao ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3513-3521

[7]

Chen T., 2016, INT C LEARN REPR

[8]

Denton E, 2014, ADV NEUR IN, V27

[9]

Fei-fei L., 2006, IEEE Trans. PAMI

[10]

Finn C, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 →