Fast classification of univariate and multivariate time series through shapelet discovery

被引:51
作者
Grabocka, Josif [1 ]
Wistuba, Martin [1 ]
Schmidt-Thieme, Lars [1 ]
机构
[1] Univ Hildesheim, Informat Syst & Machine Learning Lab, D-31141 Hildesheim, Germany
关键词
Time-series classification; Multivariate time series; Shapelet discovery;
D O I
10.1007/s10115-015-0905-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time-series classification is an important problem for the data mining community due to the wide range of application domains involving time-series data. A recent paradigm, called shapelets, represents patterns that are highly predictive for the target variable. Shapelets are discovered by measuring the prediction accuracy of a set of potential (shapelet) candidates. The candidates typically consist of all the segments of a dataset; therefore, the discovery of shapelets is computationally expensive. This paper proposes a novel method that avoids measuring the prediction accuracy of similar candidates in Euclidean distance space, through an online clustering/pruning technique. In addition, our algorithm incorporates a supervised shapelet selection that filters out only those candidates that improve classification accuracy. Empirical evidence on 45 univariate datasets from the UCR collection demonstrates that our method is 3-4 orders of magnitudes faster than the fastest existing shapelet discovery method, while providing better prediction accuracy. In addition, we extended our method to multivariate time-series data. Runtime results over four real-life multivariate datasets indicate that our method can classify MB-scale data in a matter of seconds and GB-scale data in a matter of minutes. The achievements do not compromise quality; on the contrary, our method is even superior to the multivariate baseline in terms of classification accuracy.
引用
收藏
页码:429 / 454
页数:26
相关论文
共 25 条
[1]  
Allan J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P37, DOI 10.1145/290941.290954
[2]  
[Anonymous], 2013, P 13 SIAM INT C DAT
[3]  
[Anonymous], P 12 IEEE INT C DAT
[4]  
[Anonymous], P 12 IEEE INT C DAT
[5]   mHealthDroid: A novel framework for agile development of mobile health applications [J].
Banos, Oresti ;
Garcia, Rafael ;
Holgado-Terriza, Juan A. ;
Damas, Miguel ;
Pomares, Hector ;
Rojas, Ignacio ;
Saez, Alejandro ;
Villalonga, Claudia .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8868 :91-98
[6]   Dealing with the Effects of Sensor Displacement in Wearable Activity Recognition [J].
Banos, Oresti ;
Attila Toth, Mate ;
Damas, Miguel ;
Pomares, Hector ;
Rojas, Ignacio .
SENSORS, 2014, 14 (06) :9995-10023
[7]  
Bruno B, 2013, IEEE INT CONF ROBOT, P1602, DOI 10.1109/ICRA.2013.6630784
[8]  
Cetin MS, 2015, SDM
[9]   Locally adaptive dimensionality reduction for indexing large time series databases [J].
Chakrabarti, K ;
Keogh, E ;
Mehrotra, S ;
Pazzani, M .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :188-228
[10]   Early classification of multivariate temporal observations by extraction of interpretable shapelets [J].
Ghalwash, Mohamed F. ;
Obradovic, Zoran .
BMC BIOINFORMATICS, 2012, 13