Time series shapelets: a novel technique that allows accurate, interpretable and fast classification

被引:279
作者
Ye, Lexiang [1 ]
Keogh, Eamonn [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
Time series; Data mining; Classification; Decision tree;
D O I
10.1007/s10618-010-0179-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of time series has been attracting great interest over the past decade. While dozens of techniques have been introduced, recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems, especially for large-scale datasets. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a high time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data and to make the classification result more explainable, which global characteristics of the nearest neighbor cannot provide. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. We can use the distance to the shapelet, rather than the distance to the nearest neighbor to classify objects. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers.
引用
收藏
页码:149 / 182
页数:34
相关论文
共 31 条
[1]  
[Anonymous], P INT C VER LARG DAT
[2]  
[Anonymous], LECT NOTES COMPUTER
[3]  
[Anonymous], CMU GRAPH LAB MOT CA
[4]  
[Anonymous], 2003, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '03, DOI DOI 10.1145/956750.956808
[5]  
Berndt DJ., 1994, USING DYNAMIC TIME W, DOI DOI 10.5555/3000850.3000887
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]   Discrimination of Arabica and Robusta in instant coffee by Fourier transform infrared spectroscopy and chemometrics [J].
Briandet, R ;
Kemsley, EK ;
Wilson, RH .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 1996, 44 (01) :170-174
[8]  
Ding H, 2008, PROC VLDB ENDOW, V1, P1542
[9]  
FALOUTSOS C, 1994, ACM SIGMOD REC 23 JU, V2, P419
[10]  
GRAMM J, 2003, LECT NOTES COMPUTER, V2751, P963