Two-Phase Multivariate Time Series Clustering to Classify Urban Rail Transit Stations

被引:5
作者
Zhang, Liying [1 ,2 ]
Pei, Tao [2 ,3 ]
Meng, Bin [4 ]
Lian, Yuanfeng [1 ]
Jin, Zhou [1 ]
机构
[1] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
[2] Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
[3] Univ Chinese Acad Sci, Coll Resources & Environm, Beijing 100049, Peoples R China
[4] Beijing Union Univ, Coll Appl Arts & Sci, Beijing 100191, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Time series analysis; Correlation; Shape; Correlation coefficient; Discrete wavelet transforms; Clustering algorithms; Multivariate time series; cluster; maximum overlap discrete wavelet transform; symbolic aggregate approximation (SAX); urban rail transit stations;
D O I
10.1109/ACCESS.2020.3022625
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider the problem of clustering objects with temporally changing multivariate variables, for instance, in the classification of cities with several changing socioeconomic indices in geographical research. If the changing multivariate can be recorded simultaneously as a multivariate time series, in which the length of each subseries is equal and the subseries can be correlated, the problem is transformed into a multivariate time series clustering problem. The available methods consider the correlations between distinct time series but overlook the shape of each time series, which causes multivariate time series with similar correlations and opposite shapes to be clustered into the same class. To overcome this problem, this paper proposes a two-phase multivariate time series clustering algorithm that considers both correlation and shape. In Phase I, the discrete wavelet transform is applied to capture the wavelet variances and the correlation coefficients between each pair of variables to realize the initial clustering of multivariate time series, where time series with a similar correlation but opposite shape may be assigned to the same cluster. In Phase II, multivariate time series are clustered based on shape via the symbolic aggregate approximation (SAX) method. In this phase, time series with similar correlations but opposite morphologies are differentiated. The method is evaluated using multivariate time series of incoming and outgoing passenger volumes from Beijing IC card data; these volume data were collected between March 4, 2013 and March 17, 2013. Based on the silhouette coefficient, our approach outperforms two popular multivariate time series clustering methods: a wavelet-based method and the SAX method.
引用
收藏
页码:167998 / 168007
页数:10
相关论文
共 23 条
  • [1] Time-series clustering - A decade review
    Aghabozorgi, Saeed
    Shirkhorshidi, Ali Seyed
    Teh Ying Wah
    [J]. INFORMATION SYSTEMS, 2015, 53 : 16 - 38
  • [2] An extensive comparative study of cluster validity indices
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Perez, Jesus M.
    Perona, Inigo
    [J]. PATTERN RECOGNITION, 2013, 46 (01) : 243 - 256
  • [3] Locally adaptive dimensionality reduction for indexing large time series databases
    Chakrabarti, K
    Keogh, E
    Mehrotra, S
    Pazzani, M
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02): : 188 - 228
  • [4] Wavelets-based clustering of multivariate time series
    D'Urso, Pierpaolo
    Maharaj, Elizabeth Ann
    [J]. FUZZY SETS AND SYSTEMS, 2012, 193 : 33 - 61
  • [5] Du G., 2018, LECT NOTES COMPUTER, P138
  • [6] Goldin D. Q., 1995, LECT NOTES COMPUTER, P137
  • [7] Guo CH, 2008, I C WIREL COMM NETW, P10903
  • [8] On the need for time series data mining benchmarks: A survey and empirical demonstration
    Keogh, E
    Kasetty, S
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (04) : 349 - 371
  • [9] Multivariate time series clustering based on common principal component analysis
    Li, Hailin
    [J]. NEUROCOMPUTING, 2019, 349 : 239 - 247
  • [10] Clustering of time series data - a survey
    Liao, TW
    [J]. PATTERN RECOGNITION, 2005, 38 (11) : 1857 - 1874