Extractive Knowledge Distillation

被引：5

作者：

Kobayashi, Takumi ^{[1
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, 1-1-1 Umezono, Tsukuba, Ibaraki, Japan

来源：

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) | 2022年

关键词：

D O I：

10.1109/WACV51458.2022.00142

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation (KD) transfers knowledge of a teacher model to improve performance of a student model which is usually equipped with lower capacity. In the KD framework, however, it is unclear what kind of knowledge is effective and how it is transferred. This paper analyzes a KD process to explore the key factors. In a KD formulation, softmax temperature entangles three main components of student and teacher probabilities and a weight for KD, making it hard to analyze contributions of those factors separately. We disentangle those components so as to further analyze especially the temperature and improve the components respectively. Based on the analysis about temperature and uniformity of the teacher probability, we propose a method, called extractive distillation, for extracting effective knowledge from the teacher model. The extractive KD touches only teacher knowledge, thus being applicable to various KD methods. In the experiments on image classification tasks using Cifar-100 and TinylmageNet datasets, we demonstrate that the proposed method outperforms the other KD methods and analyze feature representation to show its effectiveness in the framework of transfer learning.

引用

页码：1350 / 1359

页数：10

共 40 条

[1]

Agarwala Atish, 2020, ARXIV201007344

[2]

[Anonymous], 2019, ICML

[3]

Bucilua C., 2006, P ACM INT C KNOWLEDG, P535

[4] On the Efficacy of Knowledge Distillation [J].

Cho, Jang Hyun ;

Hariharan, Bharath .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4793-4801

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6]

Fukunaga K., 1990, INTRO STAT PATTERN R, DOI DOI 10.5555/92131

[7]

Furlanello Tommaso, 2018, PR MACH LEARN RES

[8] Review of extractive distillation. Process design, operation, optimization and control [J].

Gerbaud, Vincent ;

Rodriguez-Donis, Ivonne ;

Hegely, Laszlo ;

Lang, Peter ;

Denes, Ferenc ;

You, XinQiang .

CHEMICAL ENGINEERING RESEARCH & DESIGN, 2019, 141 :229-271

[9] Knowledge Distillation: A Survey [J].

Gou, Jianping ;

Yu, Baosheng ;

Maybank, Stephen J. ;

Tao, Dacheng .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1789-1819

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 →