Distance and Density Clustering for Time Series Data

被引:22
作者
Ma, Ruizhe [1 ]
Angryk, Rafal A. [1 ]
机构
[1] Georgia State Univ, Atlanta, GA 30303 USA
来源
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017) | 2017年
关键词
Time Series Clustering; Density Clustering; Dynamic Time Warping;
D O I
10.1109/ICDMW.2017.11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an important branch in the field of data mining as well as statistical analysis and is widely used in exploratory analysis. Many algorithms exist for clustering in the Euclidean space. However, time series clustering introduces new problems, such as inadequate distance measure, inaccurate cluster center description, lack of efficient and accurate clustering techniques. When dealing with time series data, Dynamic Time Warping (DTW) is an accepted and effective distance measure. For cluster updates and representation, DTW Barycenter Averaging (DBA) algorithm being a global averaging method using DTW and has proven to be an effective averaging method for time series data. In this paper, we propose a Distance Density clustering method that is a medoid-based clustering with time series data density consideration which provides clustering results in a hierarchy fashion. First, we introduce two clustering initialization techniques, from the time series similarity matrix we use majority voting to determine either the nearest or the furthest time series as the initial clustering seed. By doing so, our clustering method is deterministic, and the clustering results can always be reproduced. In the Distance Density clustering algorithm, we use medoids because it is a more representative alternative to the statistical mean, especially with time series data where the mean value is often non-existent. The time series density is a virtual density based on time series similarity; this can find more natural splits in a dataset and also the number of clusters does not need to be determined a priori. Experiments using the Distance Density clustering technique on the UCR dataset demonstrates that clustering initialization is crucial in obtaining stable and better results than random initialization on average, and is also more accurate than traditional distance clustering.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 29 条
[1]  
Abdulla WH, 2003, TENCON IEEE REGION, P1576
[2]  
[Anonymous], 2001, P 2001 SIAM INT C DA
[3]  
[Anonymous], 1987, CLUSTERING MEANS MED
[4]  
[Anonymous], 2007, P 18 ANN ACM SIAM S
[5]  
[Anonymous], 1994, USING DYNAMIC TIME W
[6]  
[Anonymous], 2009, 2009 6 INT C EL ENG
[7]  
[Anonymous], 1971, ICA
[8]  
Costanzo J A W B, 2013, REGULARIZATION DYNAM
[9]  
Drago Carlo., 2012, Revised Selected Papers of the First International Workshop on Clustering High-Dimensional Data, P72
[10]   Clustering large graphs via the Singular Value Decomposition [J].
Drineas, P ;
Frieze, A ;
Kannan, R ;
Vempala, S ;
Vinay, V .
MACHINE LEARNING, 2004, 56 (1-3) :9-33