Clustering time series with clipped data

被引:51
|
作者
Bagnall, A [1 ]
Janacek, G [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
clustering time series; clipping;
D O I
10.1007/s10994-005-5825-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.
引用
收藏
页码:151 / 178
页数:28
相关论文
共 50 条
  • [1] Clustering Time Series with Clipped Data
    Anthony Bagnall
    Gareth Janacek
    Machine Learning, 2005, 58 : 151 - 178
  • [2] Shape clustering on time series data
    Zheng, Ch
    Zhang, L.
    2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 3, 2008, : 1249 - 1253
  • [3] A clustering algorithm for time series data
    Yin, Jian
    Zhou, Duanning
    Xie, Qiong-Qiong
    SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2006, : 119 - +
  • [4] Clustering of time series data - a survey
    Liao, TW
    PATTERN RECOGNITION, 2005, 38 (11) : 1857 - 1874
  • [5] Application of Agglomerative Hierarchical Clustering for Clustering of Time Series Data
    Radovanovic, Ana
    Li, Junshi
    Milanovic, Jovica, V
    Milosavljevic, Nina
    Storchi, Riccardo
    2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, : 640 - 644
  • [6] Clustering multivariate time-series data
    Singhal, A
    Seborg, DE
    JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 427 - 438
  • [7] Clustering of multivariate time-series data
    Singhal, A
    Seborg, DE
    PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 3931 - 3936
  • [8] Distance and Density Clustering for Time Series Data
    Ma, Ruizhe
    Angryk, Rafal A.
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 25 - 32
  • [9] Time Series Clustering of Energy Meter Data
    Majumder, Patrali
    Richter, Marc
    Gotze, Jens
    2022 IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2022 IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC / I&CPS EUROPE), 2022,
  • [10] Clustering multimedia data using time series
    Niennattrakul, Vit
    Ratanamahatana, Chotirat Ann
    2006 INTERNATIONAL CONFERENCE ON HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2006, : 372 - +