KDnet-RUL: A Knowledge Distillation Framework to Compress Deep Neural Networks for Machine Remaining Useful Life Prediction

被引:85
作者
Xu, Qing [1 ]
Chen, Zhenghua [1 ]
Wu, Keyu [1 ]
Wang, Chao [2 ]
Wu, Min [1 ]
Li, Xiaoli [1 ]
机构
[1] Inst Infocomm Res, Singapore 138632, Singapore
[2] Univ Sci & Technol China, Sch Comp Sci, Hefei 230052, Peoples R China
基金
美国国家科学基金会;
关键词
Predictive models; Data models; Knowledge engineering; Feature extraction; Prediction algorithms; Deep learning; Neural networks; Generative adversarial network (GAN); knowledge distillation (KD); model compression; remaining useful life (RUL) prediction; PROGNOSTICS;
D O I
10.1109/TIE.2021.3057030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine remaining useful life (RUL) prediction is vital in improving the reliability of industrial systems and reducing maintenance cost. Recently, long short-term memory (LSTM) based algorithms have achieved state-of-the-art performance for RUL prediction due to their strong capability of modeling sequential sensory data. In many cases, the RUL prediction algorithms are required to be deployed on edge devices to support real-time decision making, reduce the data communication cost, and preserve the data privacy. However, the powerful LSTM-based methods which have high complexity cannot be deployed to edge devices with limited computational power and memory. To solve this problem, we propose a knowledge distillation framework, entitled KDnet-RUL, to compress a complex LSTM-based method for RUL prediction. Specifically, it includes a generative adversarial network based knowledge distillation (GAN-KD) for disparate architecture knowledge transfer, a learning-during-teaching based knowledge distillation (LDT-KD) for identical architecture knowledge transfer, and a sequential distillation upon LDT-KD for complicated datasets. We leverage simple and complicated datasets to verify the effectiveness of the proposed KDnet-RUL. The results demonstrate that the proposed method significantly outperforms state-of-the-art KD methods. The compressed model with 12.8 times less weights and 46.2 times less total float point operations even achieves a comparable performance with the complex LSTM model for RUL prediction.
引用
收藏
页码:2022 / 2032
页数:11
相关论文
共 44 条
[1]  
Bucilua C., 2006, 12 INT C KNOWL DISC, P535, DOI DOI 10.1145/1150402.1150464
[2]  
Chen GB, 2017, ADV NEUR IN, V30
[3]  
Chen W.-C., 2018, Revised Selected Papers, P200
[4]   Machine Remaining Useful Life Prediction via an Attention-Based Deep Learning Approach [J].
Chen, Zhenghua ;
Wu, Min ;
Zhao, Rui ;
Guretno, Feri ;
Yan, Ruqiang ;
Li, Xiaoli .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (03) :2521-2531
[5]  
Cheng Yu, 2017, CoRR
[6]  
Denil M., 2013, NIPS, P2148
[7]  
Denton E, 2014, ADV NEUR IN, V27
[8]  
Furlanello T, 2018, PR MACH LEARN RES, V80
[9]   An Adversarial Feature Distillation Method for Audio Classification [J].
Gao, Liang ;
Mi, Haibo ;
Zhu, Boqing ;
Feng, Dawei ;
Li, Yicong ;
Peng, Yuxing .
IEEE ACCESS, 2019, 7 :105319-105330
[10]  
Gong Y, 2014, ARXIV14126115