Data-free knowledge distillation in neural networks for regression

被引：28

作者：

Kang, Myeonginn ^{[1
]}

Kang, Seokho ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Ind Engn, 2066 Seobu Ro, Suwon 16419, South Korea

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2021年 / 175卷

基金：

新加坡国家研究基金会;

关键词：

Neural network; Knowledge distillation; Data-free knowledge distillation; Zero-shot knowledge distillation; Regression;

D O I：

10.1016/j.eswa.2021.114813

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation has been used successfully to compress a large neural network (teacher) into a smaller neural network (student) by transferring the knowledge of the teacher network with its original training dataset. However, the original training dataset is not reusable in many real-world applications. To address this issue, data-free knowledge distillation, which is knowledge distillation in the absence of the original training datasets, has been studied. However, existing methods are limited to classification problems and cannot be directly applied to regression problems. In this study, we propose a novel data-free knowledge distillation method that is applicable to regression problems. Given a teacher network, we adopt a generator network to transfer the knowledge in the teacher network to a student network. We simultaneously train the generator and student networks in an adversarial manner. The generator network is trained to create synthetic data on which the teacher and student networks make different predictions, with the student network being trained to mimic the teacher network's predictions. We demonstrate the effectiveness of the proposed method on benchmark datasets. Our results show that the student network emulates the prediction ability of the teacher network with little performance loss.

引用

页数：7

共 27 条

[1] The Arithmetic Optimization Algorithm [J].

Abualigah, Laith ;

Diabat, Ali ;

Mirjalili, Seyedali ;

Elaziz, Mohamed Abd ;

Gandomi, Amir H. .

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376

[2] Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering [J].

Abualigah, Laith Mohammad ;

Khader, Ahamad Tajudin ;

Al-Betar, Mohammed Azmi ;

Alomari, Osama Ahmad .

EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 :24-36

[3]

Al-shayea Q.K., 2011, International Journal of Computer Science Issues, V8, P150

[4]

Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255

[5]

[Anonymous], 2017, ST PETERSBURG POLYTE

[6]

Arjovsky M., 2017, CoRR

[7]

Chen GB, 2017, ADV NEUR IN, V30

[8] Data-Free Learning of Student Networks [J].

Chen, Hanting ;

Wang, Yunhe ;

Xu, Chang ;

Yang, Zhaohui ;

Liu, Chuanjian ;

Shi, Boxin ;

Xu, Chunjing ;

Xu, Chao ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3513-3521

[9] Model Compression and Acceleration for Deep Neural Networks The principles, progress, and challenges [J].

Cheng, Yu ;

Wang, Duo ;

Zhou, Pan ;

Zhang, Tao .

IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :126-136

[10]

Dua D., 2017, UCI machine learning repository

← 1 2 3 →