DDDG: A dual bi-directional knowledge distillation method with generative self-supervised pre-training and its hardware implementation on SoC for ECG

被引：2

作者：

Zhang, Huaicheng ^{[1
]}

Liu, Wenhan ^{[1
]}

Guo, Qianxi ^{[1
]}

Shi, Jiguang ^{[1
]}

Chang, Sheng ^{[1
]}

Wang, Hao ^{[1
]}

He, Jin ^{[1
]}

Huang, Qijun ^{[1
]}

机构：

[1] Wuhan Univ, Sch Phys & Technol, Wuhan 430072, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 244卷

基金：

中国国家自然科学基金;

关键词：

Knowledge distillation (KD); Self-supervised Learning (SSL); Masked time autoencoder (MTAE); Teaching others teaches yourself (TOTY); Cardiovascular diseases (CVD); CLASSIFICATION;

D O I：

10.1016/j.eswa.2023.122969

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, the increase in computing power and data volume boosts the development of deep learning. However, computational resources and the high cost of data labeling are two main obstacles to employing algorithms in various applications. Therefore, a novel method naming Dual Distillation Double Gains (DDDG) is proposed, it is a dual bi-directional knowledge distillation (KD) method with generative self-supervised pre-training. In a self-supervised manner, models are pre-trained with unlabeled data. KD can transfer knowledge from a large model to a lightweight one, which is more suitable for deployments on portable/mobile devices. Based on the teacher-student structure, a reconstructing teacher and a classifying teacher are pre-trained in advance. The reconstructing teacher distills knowledge for the student in pretext tasks by feature-based knowledge. The second distillation occurs in fine-tuning, the classifying teacher mentors the student with response-based knowledge. Both of the distillations are bi-directional, which also reinforce the teacher model in reverse. According to experimental results, F1 score of the student network in two datasets is improved by 8.69% and 9.26% respectively. This value for the teacher is 4.82% and 8.33%. Additionally, DDDG outperforms other state-of-the-art algorithms by 5.25% and 2.06% in F1. For practical applications, DDDG is deployed to a "system-on-a-chip"(SoC) in a heterogeneous manner. Employing ARM and FPGA, the designed system accelerates DDDG by 4.09 times than pure software deployment on the same SoC. The efficient model deployments in heterogeneous systems is promising to be applied to practical applications.

引用

页数：13

共 37 条

[1] Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
[2] Knowledge Distillation with the Reused Teacher Classifier
Chen, Defang
Mei, Jian-Ping
Zhang, Hailin
Wang, Can
Feng, Yan
Chen, Chun
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11923 - 11932
[3] Chen T, 2020, PR MACH LEARN RES, V119
[4] Dosovitskiy A., 2020, ICLR, V20, DOI 10.48550/arXiv.2010.11929
[5] Fang ZY, 2021, Arxiv, DOI arXiv:2101.04731
[6] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861, 10.48550/arXiv.1704.04861]
[7] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
[8] Gopal B, 2021, PR MACH LEARN RES, V158, P156
[9] Knowledge Distillation: A Survey
Gou, Jianping
Yu, Baosheng
Maybank, Stephen J.
Tao, Dacheng
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
[10] Guodong Xu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P588, DOI 10.1007/978-3-030-58545-7_34

← 1 2 3 4 →