Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

被引：627

作者：

Zhang, Linfeng ^{[1
]}

Song, Jiebo ^{[3
]}

Gao, Anni ^{[3
]}

Chen, Jingwei ^{[4
]}

Bao, Chenglong ^{[2
]}

Ma, Kaisheng ^{[1
]}

机构：

[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China

[2] Tsinghua Univ, Yau Math Sci Ctr, Beijing, Peoples R China

[3] Inst Interdisciplinary Informat Core Technol, Beijing, Peoples R China

[4] HiSilicon, Shenzhen, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00381

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices. Our codes have been released on github(5).

引用

页码：3712 / 3721

页数：10

共 43 条

[1]

Amthor Manuel, 2016, BRIT MACH VIS C

[2]

[Anonymous], 2015, BINARYCONNECT TRAINI

[3]

[Anonymous], 2018, IEEE INT SYMP CIRC S, DOI DOI 10.1109/ISCAS.2018.8351436

[4]

[Anonymous], 2017, ICLR 2017

[5]

[Anonymous], 2015, 3 INT C LEARNING REP

[6]

[Anonymous], 2018, P EUROPEAN C COMPUTE

[7]

[Anonymous], 2014, ADV NEURAL INFORM PR

[8]

[Anonymous], 2016, P BRIT MACHINE VISIO

[9]

[Anonymous], 2014, Neural Information Processing Systems

[10]

Bagherinezhad Hessam, 2018, EUR C COMP VIS

← 1 2 3 4 5 →