Time-series classification with SAFE: Simple and fast segmented word embedding-based neural time series classifier

被引:13
作者
Tabassum, Nuzhat [1 ,2 ]
Menon, Sujeendran [1 ]
Jastrzebska, Agnieszka [1 ]
机构
[1] Warsaw Univ Technol, Fac Math & Informat Sci, Ul Koszykowa 75, PL-00662 Warsaw, Poland
[2] Mil Inst Sci & Technol, Dept Comp Sci & Engn, Dhaka 1216, Bangladesh
关键词
Time series; Classification; Word embedding; Neural network; FOREST;
D O I
10.1016/j.ipm.2022.103044
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dictionary-based classifiers are an essential group of approaches in the field of time series classification. Their distinctive characteristic is that they transform time series into segments made of symbols (words) and then classify time series using these words. Dictionary-based approaches are suitable for datasets containing time series of unequal length. The prevalence of dictionary-based methods inspired the research in this paper. We propose a new dictionary-based classifier called SAFE. The new approach transforms the raw numeric data into a symbolic representation using the Simple Symbolic Aggregate approXimation (SAX) method. We then partition the symbolic time series into a sequence of words. Then we employ the word embedding neural model known in Natural Language Processing to train the classifying mechanism. The proposed scheme was applied to classify 30 benchmark datasets and compared with a range of state-of-the-art time series classifiers. The name SAFE comes from our observation that this method is safe to use. Empirical experiments have shown that SAFE gives excellent results: it is always in the top 5%-10% when we rank the classification accuracy of state-of-the-art algorithms for various datasets. Our method ranks third in the list of state-of-the-art dictionary-based approaches (after the WEASEL and BOSS methods).
引用
收藏
页数:25
相关论文
共 57 条
[1]  
Arora S., 2020, arXiv, DOI DOI 10.48550/ARXIV.2005.09117
[2]   Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology [J].
Asif, Muhammad ;
Martiniano, Hugo F. M. C. M. ;
Vicente, Astrid M. ;
Couto, Francisco M. .
PLOS ONE, 2018, 13 (12)
[3]   The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances [J].
Bagnall, Anthony ;
Lines, Jason ;
Bostrom, Aaron ;
Large, James ;
Keogh, Eamonn .
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (03) :606-660
[4]   Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles [J].
Bagnall, Anthony ;
Lines, Jason ;
Hills, Jon ;
Bostrom, Aaron .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) :2522-2535
[5]   Time series classification based on multi-feature dictionary representation and ensemble learning [J].
Bai, Bing ;
Li, Guiling ;
Wang, Senzhang ;
Wu, Zongda ;
Yan, Wenhe .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
[6]   Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data [J].
Behera, Ranjan Kumar ;
Jena, Monalisa ;
Rath, Santanu Kumar ;
Misra, Sanjay .
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (01)
[7]   An ensemble model for classifying idioms and literal texts using BERT and RoBERTa [J].
Briskilal, J. ;
Subalalitha, C. N. .
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[8]  
Bruyn M. D, 2018, WORD FINANCIAL TIME, DOI [10.2139/ssrn.3184513, DOI 10.2139/SSRN.3184513]
[9]   Multi-Attention Mechanism Medical Image Segmentation Combined with Word Embedding Technology [J].
Cheng, Junlong ;
Tian, Shengwei ;
Yu, Long ;
You, Hongfeng .
AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2020, 54 (06) :560-571
[10]   ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels [J].
Dempster, Angus ;
Petitjean, Francois ;
Webb, Geoffrey, I .
DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (05) :1454-1495