k-Shape: Efficient and Accurate Clustering of Time Series

被引:513
作者
Paparrizos, John [1 ]
Gravano, Luis [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2015年
关键词
ALGORITHM; REPRESENTATION;
D O I
10.1145/2723372.2737793
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we present k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. To demonstrate the robustness of k-Shape, we perform an extensive experimental evaluation of our approach against partitional, hierarchical, and spectral clustering methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable approaches in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable (and hence impractical) combinations, with one exception that achieves similar accuracy results. However, unlike k-Shape, this combination requires tuning of its distance measure and is two orders of magnitude slower than k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and highly efficient clustering approach for time series with broad applications.
引用
收藏
页码:1855 / 1870
页数:16
相关论文
共 89 条
[41]  
KALPAKIS K, 2001, ICDM, P273, DOI DOI 10.1109/ICDM.2001.989529
[42]  
Katznelson Y., 2004, An introduction to harmonic analysis, V3rd
[43]  
Kaufman L., 2009, Finding Groups in Data: An Introduction to Cluster Analysis
[44]   Clustering of time-series subsequences is meaningless: implications for previous and future research [J].
Keogh, E ;
Lin, J .
KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (02) :154-177
[45]   Exact indexing of dynamic time warping [J].
Keogh, E ;
Ratanamahatana, CA .
KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 7 (03) :358-386
[46]   Locally adaptive dimensionality reduction for indexing large time series databases [J].
Keogh, E ;
Chakrabarti, K ;
Mehrotra, S ;
Pazzani, M .
SIGMOD RECORD, 2001, 30 (02) :151-162
[47]  
Korn F., 1997, SIGMOD Record, V26, P289, DOI 10.1145/253262.253332
[48]  
Lian X., 2007, P 23 IEEE INT C DAT, P1086
[49]   Clustering of time series data - a survey [J].
Liao, TW .
PATTERN RECOGNITION, 2005, 38 (11) :1857-1874
[50]  
Lin J, 2004, LECT NOTES COMPUT SC, V2992, P106