Approximate Clustering of Time-Series Datasets using k-Modes Partitioning

被引:0
作者
Aghabozorgi, Saeed [1 ]
Teh Ying Wah [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
关键词
data mining; clustering; time series; approximation; distance measure; dimensionality reduction; DIMENSIONALITY REDUCTION; SIMILARITY SEARCH; REPRESENTATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data in various systems, such as those in finance, healthcare, and business, are stored as time series. As such, interest in time series mining in these areas has surged. Clustering of data is performed as a pre-processing or exploratory approach in many data mining tasks. Time series data sets are often very large, thus, data cannot fit in the main memory for clustering. In this case, dimension reduction is a common solution. However, the cost of data reduction is relatively high because of overlooking the data involved in this process, leading to low-quality clustering. In this paper, we propose a new approach for improving the approximate clustering accuracy of dimensionality reduced time series by discretization approach. A new distance measure is initially introduced. Thereafter, the partitional algorithm that best matches the representation method is proposed.
引用
收藏
页码:207 / 228
页数:22
相关论文
共 57 条
  • [21] Chis M, 2009, STUD COMPUT INTELL, V206, P193
  • [22] Adaptive dimension reduction for clustering high dimensional data
    Ding, C
    He, XF
    Zha, HY
    Simon, HD
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 147 - 154
  • [23] Ding H, 2008, PROC VLDB ENDOW, V1, P1542
  • [24] A new correlation-based fuzzy logic clustering algorithm for fMRI
    Golay, X
    Kollias, S
    Stoll, G
    Meier, D
    Valavanis, A
    Boesiger, P
    [J]. MAGNETIC RESONANCE IN MEDICINE, 1998, 40 (02) : 249 - 260
  • [25] Hautamaki V., 2008, P 19 INT C PATTERN R, P1
  • [26] Hirano S, 2005, LECT NOTES ARTIF INT, V3430, P268
  • [27] Huang Z., 1997, DMKD, P1
  • [28] Distance measures for effective clustering of ARIMA time-series
    Kalpakis, K
    Gada, D
    Puttagunta, V
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 273 - 280
  • [29] HOT SAX: Efficiently finding the most unusual time series subsequence
    Keogh, E
    Lin, J
    Fu, AD
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 226 - 233
  • [30] Locally adaptive dimensionality reduction for indexing large time series databases
    Keogh, E
    Chakrabarti, K
    Mehrotra, S
    Pazzani, M
    [J]. SIGMOD RECORD, 2001, 30 (02) : 151 - 162