Clustering of large time series datasets

被引:22
作者
Aghabozorgi, Saeed [1 ]
Teh, Ying Wah [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
关键词
Data mining; clustering; time series; large datasets; FAST SIMILARITY SEARCH; DIMENSIONALITY REDUCTION; AVERAGING METHOD; REPRESENTATION; RETRIEVAL; ALGORITHM;
D O I
10.3233/IDA-140669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to improve the accuracy of clustering of time series data. In the first step, time series data are clustered approximately. Then, in the second step, the built clusters are split into sub-clusters. Finally, sub-clusters are merged in the third step. In contrast to existing approaches, this method can generate accurate clusters based on similarity in shape in very large time series datasets. The accuracy of the proposed method is evaluated using various published datasets in different domains.
引用
收藏
页码:793 / 817
页数:25
相关论文
共 92 条
[1]   Aligning gene expression time series with time warping algorithms [J].
Aach, J ;
Church, GM .
BIOINFORMATICS, 2001, 17 (06) :495-508
[2]  
Aghabozorgi Saeed R., 2011, Proceedings of the 2011 International Conference on Data Mining (DMIN 2011), P214
[3]  
Aghabozorgi S, 2012, J INF SCI ENG, V28, P671
[4]  
Alcock R., 1999, 7th Hellenic conference on informatics, Ioannina, Greece, P1
[5]  
Alon J, 2003, PROC CVPR IEEE, P375
[6]   A comparison of extrinsic clustering evaluation metrics based on formal constraints [J].
Amigo, Enrique ;
Gonzalo, Julio ;
Artiles, Javier ;
Verdejo, Felisa .
INFORMATION RETRIEVAL, 2009, 12 (04) :461-486
[7]   A roadmap of clustering algorithms: finding a match for a biomedical application [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang ;
Schroeder, Michael .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (03) :297-314
[8]  
[Anonymous], P 4 INT C MACH LEARN
[9]  
[Anonymous], P 8 ACM SIGMOD WORKS
[10]  
[Anonymous], 2003, P WORKSH CLUST HIGH