Fast classification of univariate and multivariate time series through shapelet discovery

被引:0
作者
Josif Grabocka
Martin Wistuba
Lars Schmidt-Thieme
机构
[1] University of Hildesheim,Information Systems and Machine Learning Lab
来源
Knowledge and Information Systems | 2016年 / 49卷
关键词
Time-series classification; Multivariate time series ; Shapelet discovery;
D O I
暂无
中图分类号
学科分类号
摘要
Time-series classification is an important problem for the data mining community due to the wide range of application domains involving time-series data. A recent paradigm, called shapelets, represents patterns that are highly predictive for the target variable. Shapelets are discovered by measuring the prediction accuracy of a set of potential (shapelet) candidates. The candidates typically consist of all the segments of a dataset; therefore, the discovery of shapelets is computationally expensive. This paper proposes a novel method that avoids measuring the prediction accuracy of similar candidates in Euclidean distance space, through an online clustering/pruning technique. In addition, our algorithm incorporates a supervised shapelet selection that filters out only those candidates that improve classification accuracy. Empirical evidence on 45 univariate datasets from the UCR collection demonstrates that our method is 3–4 orders of magnitudes faster than the fastest existing shapelet discovery method, while providing better prediction accuracy. In addition, we extended our method to multivariate time-series data. Runtime results over four real-life multivariate datasets indicate that our method can classify MB-scale data in a matter of seconds and GB-scale data in a matter of minutes. The achievements do not compromise quality; on the contrary, our method is even superior to the multivariate baseline in terms of classification accuracy.
引用
收藏
页码:429 / 454
页数:25
相关论文
共 16 条
[1]  
Banos O(2014)Dealing with the effects of sensor displacement in wearable activity recognition Sensors 14 9995-10023
[2]  
Toth MA(2002)Locally adaptive dimensionality reduction for indexing large time series databases ACM Trans Database Syst 27 188-228
[3]  
Damas M(2003)An introduction to variable and feature selection J Mach Learn Res 3 1157-1182
[4]  
Pomares H(2012)Early classification on time series Knowl Inf Syst 31 105-127
[5]  
Rojas I(2011)Time series shapelets: a novel technique that allows accurate, interpretable and fast classification Data Min Knowl Discov 22 149-182
[6]  
Chakrabarti K(undefined)undefined undefined undefined undefined-undefined
[7]  
Keogh E(undefined)undefined undefined undefined undefined-undefined
[8]  
Mehrotra S(undefined)undefined undefined undefined undefined-undefined
[9]  
Pazzani M(undefined)undefined undefined undefined undefined-undefined
[10]  
Guyon I(undefined)undefined undefined undefined undefined-undefined