Heterogeneous Knowledge Distillation Using Conceptual Learning

被引：0

作者：

Yu, Yerin ^{[1
]}

Kim, Namgyu ^{[1
]}

机构：

[1] Kookmin Univ, Grad Sch Business IT, Seoul 02707, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Knowledge distillation; conceptual learning; deep learning; pretrained model; model compression;

D O I：

10.1109/ACCESS.2024.3387459

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in deep learning have led to the development of large, high-performing models that have been pretrained on massive datasets. However, employing these models in real-world services requires fast inference speed and low computational complexity. This has driven an interest in model compression techniques, such as knowledge distillation, which transfers the knowledge learned by a teacher model to a smaller student model. However, traditional knowledge distillation models are limited in that the student learns from the teacher model only the knowledge needed to solve the given problem. Therefore, giving an appropriate answer to a case that has not been encountered yet is difficult. In this study, we propose a heterogeneous knowledge distillation method that distills knowledge through a teacher model that has obtained knowledge about higher concepts, not the knowledge that needs to be obtained. The proposed methodology is based on the pedagogical discovery that problems can be solved better by learning not only specific knowledge about the problem but also general knowledge of higher concepts. In particular, beyond the limitations where traditional knowledge distillation was only capable of transferring knowledge for the same task, one can anticipate performance enhancement in lightweight models and extended applicability of pre-trained teacher models through the transfer of heterogeneous knowledge using the proposed methodology. In addition, through classification experiments on 70,000 images from the machine learning benchmark dataset Fashion-MNIST, we confirmed that the proposed heterogeneous knowledge distillation methodology achieves superior performance in terms of classification accuracy than does traditional knowledge distillation.

引用

页码：52803 / 52814

页数：12

共 28 条

[1]

Ba R., 2014, Adv.Neural Inf. Process. Syst., V27

[2]

Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430

[3]

Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

[4]

Erickson H. L., 2017, Concept-based curriculum and instruction for the thinking classroom, V2nd

[5]

Gou B. Yu, Knowledge distillation: Asurvey

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7]

Heo B, 2019, AAAI CONF ARTIF INTE, P3779

[8]

Hinton G, 2015, Arxiv, DOI arXiv:1503.02531

[9] Learning Lightweight Lane Detection CNNs by Self Attention Distillation [J].

Hou, Yuenan ;

Ma, Zheng ;

Liu, Chunxiao ;

Loy, Chen Change .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1013-1021

[10]

Huang ZH, 2017, Arxiv, DOI [arXiv:1707.01219, DOI 10.48550/ARXIV.1707.01219]

← 1 2 3 →