DDDG: A dual bi-directional knowledge distillation method with generative self-supervised pre-training and its hardware implementation on SoC for ECG

被引:2
作者
Zhang, Huaicheng [1 ]
Liu, Wenhan [1 ]
Guo, Qianxi [1 ]
Shi, Jiguang [1 ]
Chang, Sheng [1 ]
Wang, Hao [1 ]
He, Jin [1 ]
Huang, Qijun [1 ]
机构
[1] Wuhan Univ, Sch Phys & Technol, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge distillation (KD); Self-supervised Learning (SSL); Masked time autoencoder (MTAE); Teaching others teaches yourself (TOTY); Cardiovascular diseases (CVD); CLASSIFICATION;
D O I
10.1016/j.eswa.2023.122969
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, the increase in computing power and data volume boosts the development of deep learning. However, computational resources and the high cost of data labeling are two main obstacles to employing algorithms in various applications. Therefore, a novel method naming Dual Distillation Double Gains (DDDG) is proposed, it is a dual bi-directional knowledge distillation (KD) method with generative self-supervised pre-training. In a self-supervised manner, models are pre-trained with unlabeled data. KD can transfer knowledge from a large model to a lightweight one, which is more suitable for deployments on portable/mobile devices. Based on the teacher-student structure, a reconstructing teacher and a classifying teacher are pre-trained in advance. The reconstructing teacher distills knowledge for the student in pretext tasks by feature-based knowledge. The second distillation occurs in fine-tuning, the classifying teacher mentors the student with response-based knowledge. Both of the distillations are bi-directional, which also reinforce the teacher model in reverse. According to experimental results, F1 score of the student network in two datasets is improved by 8.69% and 9.26% respectively. This value for the teacher is 4.82% and 8.33%. Additionally, DDDG outperforms other state-of-the-art algorithms by 5.25% and 2.06% in F1. For practical applications, DDDG is deployed to a "system-on-a-chip"(SoC) in a heterogeneous manner. Employing ARM and FPGA, the designed system accelerates DDDG by 4.09 times than pure software deployment on the same SoC. The efficient model deployments in heterogeneous systems is promising to be applied to practical applications.
引用
收藏
页数:13
相关论文
共 37 条
  • [1] Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
  • [2] Knowledge Distillation with the Reused Teacher Classifier
    Chen, Defang
    Mei, Jian-Ping
    Zhang, Hailin
    Wang, Can
    Feng, Yan
    Chen, Chun
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11923 - 11932
  • [3] Chen T, 2020, PR MACH LEARN RES, V119
  • [4] Dosovitskiy A., 2020, ICLR, V20, DOI 10.48550/arXiv.2010.11929
  • [5] Fang ZY, 2021, Arxiv, DOI arXiv:2101.04731
  • [6] Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, DOI 10.48550/ARXIV.1704.04861, 10.48550/arXiv.1704.04861]
  • [7] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [8] Gopal B, 2021, PR MACH LEARN RES, V158, P156
  • [9] Knowledge Distillation: A Survey
    Gou, Jianping
    Yu, Baosheng
    Maybank, Stephen J.
    Tao, Dacheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
  • [10] Guodong Xu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P588, DOI 10.1007/978-3-030-58545-7_34