A study of Knowledge Distillation in Fully Convolutional Network for Time Series Classification

被引:5
作者
Ay, Emel [1 ]
Devanne, Maxime [1 ]
Weber, Jonathan [1 ]
Forestier, Germain [1 ]
机构
[1] Univ Haute Alsace, IRIMAS, Mulhouse, France
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Times Series Classification; Knowledge Distillation; Fully Convolutional Network;
D O I
10.1109/IJCNN55064.2022.9892915
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning revolutionized the field of machine learning. While many applications of deep learning are observed in computer vision, other domains like natural language processing (NLP) or speech recognition also benefited from advances in deep learning research. More recently, the field of time series analysis and more especially time series classification (TSC) also witnessed the emergence of deep neural networks providing competitive results. Through the years, the proposed network architectures became deeper and deeper pushing the performance higher. While these very deep models achieve impressive accuracy, their training and deployment became challenging. Indeed, a large number of GPUs is often required to train state-of-the-art networks and obtain high performances. While the requirements needed for the training step can be acceptable, deploying very deep neural networks can be difficult especially in embedded systems (e.g. robots) or devices with limited resources (e.g. web browsers, smartphones). In this context, knowledge distillation is a machine learning task consisting in transferring knowledge from a large model to a smaller one with fewer parameters. The goal is to create a lighter model mimicking the predictions of a larger one in order to obtain similar performances with a fraction of the computational cost. In this paper, we introduce and explore the concept of knowledge distillation for the specific task of TSC. We also present a first experimental study showing promising results on several datasets of the UCR time series archive. As current state-of-the-art models for TSC are deep and sometimes ensemble of models, we believe that knowledge distillation could become an important research area in the coming years.
引用
收藏
页数:8
相关论文
共 31 条
[1]  
Ba LJ, 2014, ADV NEUR IN, V27
[2]   A Bag-of-Features Framework to Classify Time Series [J].
Baydogan, Mustafa Gokce ;
Runger, George ;
Tuv, Eugene .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) :2796-2802
[3]  
Bostrom A., 2021, ARXIV210407551
[4]  
Chelali Mohamed, 2020, Pattern Recognition and Artificial Intelligence. International Conference, ICPRAI 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12068), P484, DOI 10.1007/978-3-030-59830-3_42
[5]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[6]  
Cui Z., 2016, arXiv
[7]  
Dau H.A, 2018, Knowledge Discovery and Data Mining
[8]   ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels [J].
Dempster, Angus ;
Petitjean, Francois ;
Webb, Geoffrey, I .
DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (05) :1454-1495
[9]  
Devanne M, 2017, 2017 IEEE-RAS 17TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTICS (HUMANOIDS), P529, DOI 10.1109/HUMANOIDS.2017.8246923
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171