Compressing medical deep neural network models for edge devices using knowledge distillation

被引：8

作者：

Alabbasy, F. MohiEldeen ^{[1
]}

Abohamama, A. S. ^{[1
,2
]}

Alabbasy, Mohieldeen ^{[1
]}

机构：

[1] Mansoura Univ, Fac Comp & Informat, Dept Comp Sci, Mansoura, Egypt

[2] Arab East Coll, Dept Comp Sci, Riyadh, Saudi Arabia

来源：

JOURNAL OF KING SAUD UNIVERSITY COMPUTER AND INFORMATION SCIENCES | 2023年 / 35卷 / 07期

关键词：

Knowledge distillation; Deep models; Edge devices; Deep models' compressing techniques;

D O I：

10.1016/j.jksuci.2023.101616

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep neural networks (DNNs) have been used successfully in many fields, particularly, in medical diagnosis. However, deep learning (DL) models are expensive in terms of memory and computing resources, which hinders their implementation in limited-resources devices or for delay-sensitive systems. Therefore, these deep models need to be accelerated and compressed to smaller sizes to be deployed on edge devices without noticeably affecting their performance. In this paper, recent accelerating and compression approaches of DNN are analyzed and compared regarding their performance, applications, benefits, and limitations with a more focus on the knowledge distillation approach as a successful emergent approach in this field. In addition, a framework is proposed to develop knowledge distilled DNN models that can be deployed on fog/edge devices for automatic disease diagnosis. To evaluate the proposed framework, two compressed medical diagnosis systems are proposed based on knowledge distillation deep neural models for both COVID-19 and Malaria. The experimental results show that these knowledge distilled models have been compressed by 18.4% and 15% of the original model and their responses accelerated by 6.14x and 5.86%, respectively, while there were no significant drop in their performance (dropped by 0.9% and 1.2%, respectively). Furthermore, the distilled models are compared with other pruned and quantized models. The obtained results revealed the superiority of the distilled models in terms of compression rates and response time. (c) 2023 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

引用

页数：21

共 69 条

[1]

Balan A.K., 2015, Advances in Neural Information Processing Systems, P3420

[2] Data-free Knowledge Distillation for Object Detection [J].

Chawla, Akshay ;

Yin, Hongxu ;

Molchanov, Pavlo ;

Alvarez, Jose .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3288-3297

[3] Learning Student Networks via Feature Embedding [J].

Chen, Hanting ;

Wang, Yunhe ;

Xu, Chang ;

Xu, Chao ;

Tao, Dacheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) :25-35

[4]

Chen Yen-Chun., 2020, P 58 ANN M ASS COMP, P7893, DOI DOI 10.18653/V1/2020.ACL-MAIN.705

[5] A Multi-task Mean Teacher for Semi-supervised Shadow Detection [J].

Chen, Zhihao ;

Zhu, Lei ;

Wan, Liang ;

Wang, Song ;

Feng, Wei ;

Heng, Pheng-Ann .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5610-5619

[6]

Chen ZY, 2016, Synthesis Lectures on Artificial Intelligence and Machine Learning, V10, P1, DOI [10.2200/s00737ed1v01y201610aim033, 10.1007/978-3-031-01581-6, DOI 10.1007/978-3-031-01581-6, DOI 10.2200/S00737ED1V01Y201610AIM033]

[7]

Cheng Y., 2017, SURVEY MODEL COMPRES

[8]

Cheng Y, 2020, Arxiv, DOI arXiv:1710.09282

[9]

CHUN J, 1991, GEN DISPLACEMENT STR, P215

[10]

Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830

← 1 2 3 4 5 6 7 →