Clustering of large time series datasets

被引:22
作者
Aghabozorgi, Saeed [1 ]
Teh, Ying Wah [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
关键词
Data mining; clustering; time series; large datasets; FAST SIMILARITY SEARCH; DIMENSIONALITY REDUCTION; AVERAGING METHOD; REPRESENTATION; RETRIEVAL; ALGORITHM;
D O I
10.3233/IDA-140669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to improve the accuracy of clustering of time series data. In the first step, time series data are clustered approximately. Then, in the second step, the built clusters are split into sub-clusters. Finally, sub-clusters are merged in the third step. In contrast to existing approaches, this method can generate accurate clusters based on similarity in shape in very large time series datasets. The accuracy of the proposed method is evaluated using various published datasets in different domains.
引用
收藏
页码:793 / 817
页数:25
相关论文
共 92 条
[21]  
Banerjee Arindam., 2001, P WORKSHOP WEB MININ, P33
[22]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[23]   Online clustering of parallel data streams [J].
Beringer, Juergen ;
Huellermeier, Eyke .
DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) :180-204
[24]  
Berndt D. J., 1994, AAAIWS 94 P 3 INT C, P359
[25]  
Bicego M, 2003, LECT NOTES ARTIF INT, V2734, P86
[26]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[27]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[28]   Haar wavelets for efficient similarity search of time-series: With and without time warping [J].
Chan, FKP ;
Fu, AWC ;
Yu, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) :686-705
[29]   A Density based Method for Multivariate Time Series Clustering in Kernel Feature Space [J].
Chandrakala, S. ;
Sekhar, C. Ch Indra .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1885-1890
[30]  
Chen L, 2004, P 30 INT C VER LARG, V30, P792, DOI [DOI 10.1016/B978-012088469-8.50070-X, 10.5555/1316689.1316758, DOI 10.5555/1316689.1316758]