A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data

被引:18
作者
He, Guoliang [1 ]
Pan, Yanzhou [2 ]
Xia, Xuewen [3 ]
He, Jinrong [4 ]
Peng, Rong [1 ]
Xiong, Neal N. [5 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430079, Peoples R China
[2] Rice Univ, Engn Dept, Houston, TX 77005 USA
[3] Minnan Normal Univ, Coll Phys & Informat Engn, Zhangzhou 363000, Peoples R China
[4] Yanan Univ, Coll Math & Comp Sci, Yanan 716000, Peoples R China
[5] Northeastern State Univ, Dept Math & Comp Sci, Tahlequah, OK 74464 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2021年 / 51卷 / 07期
基金
中国国家自然科学基金;
关键词
Time series analysis; Clustering algorithms; Time measurement; Velocity measurement; Shape measurement; Clustering methods; Contracts; Constraint propagation; semi-supervised learning; similarity measure; time series clustering; CLASSIFICATION;
D O I
10.1109/TSMC.2019.2931731
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semi-supervised clustering algorithms have several limitations: 1) the computation complexity of them is very high, because calculating the similarity distances of pairs of examples is time-consuming; 2) traditional semi-supervised clustering methods have not considered how to make full use of must-link and cannot-link constraints. In the clustering, the contribution of a few pairwise constraints to the clustering performance is very limited, and some may negatively affect the outcome; and 3) these methods are not effective to handle high dimensional data, especially for time series data. Up to now, few work touched semi-supervised clustering on time series data. To efficiently cluster large-scale time series data, we first tackle contract time series clustering to produce the most accurate clustering results under a contracted time. We propose a semi-supervised time series clustering framework (STSC), which integrates a fast similarity measure and a constraint propagation approach. Based on the proposed framework, two valid semi-supervised clustering algorithms including fssK-means and fssDBSCAN are designed. Experiments on 11 datasets show that our proposed method is efficient and effective for clustering large-scale time series data.
引用
收藏
页码:4201 / 4216
页数:16
相关论文
共 52 条
  • [1] Semi-Supervised Kernel Mean Shift Clustering
    Anand, Saket
    Mittal, Sushil
    Tuzel, Oncel
    Meer, Peter
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (06) : 1201 - 1215
  • [2] Creating Discriminative Models for Time Series Classification and Clustering by HMM Ensembles
    Asadi, Nazanin
    Mirzaei, Abdolreza
    Haghshenas, Ehsan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 2899 - 2910
  • [3] CID: an efficient complexity-invariant distance for time series
    Batista, Gustavo E. A. P. A.
    Keogh, Eamonn J.
    Tataw, Oben Moses
    de Souza, Vinicius M. A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 634 - 669
  • [4] Chen Lei, 2004, P 30 INT C VER LARG, P792, DOI DOI 10.1016/B978-012088469-8.50070-X
  • [5] Chen Y., 2015, The UCR Time Series Classification Archive
  • [6] Forward Forecast of Stock Price Using Sliding-Window Metaheuristic-Optimized Machine-Learning Regression
    Chou, Jui-Sheng
    Thi-Kha Nguyen
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) : 3132 - 3142
  • [7] Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping
    Dau, Hoang Anh
    Begum, Nurjahan
    Keogh, Eamonn
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 999 - 1008
  • [8] YADING: Fast Clustering of Large-Scale Time Series Data
    Ding, Rui
    Wang, Qiang
    Dang, Yingnong
    Fu, Qiang
    Zhang, Haidong
    Zhang, Dongmei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (05): : 473 - 484
  • [9] A semi-supervised approximate spectral clustering algorithm based on HMRF model
    Ding, Shifei
    Jia, Hongjie
    Du, Mingjing
    Xue, Yu
    [J]. INFORMATION SCIENCES, 2018, 429 : 215 - 228
  • [10] Ester M., 1996, P 2 INT C KNOWL DISC, V96, P226, DOI DOI 10.5555/3001460.3001507