Estimate of time series similarity based on models

被引:2
作者
Knignitskaya T.V. [1 ]
机构
[1] Yuriy Fedkovych Chernovtsy National University, Chernovtsy
关键词
Cluster; Clustering; Distance between time series by models; DTW; ERP; Time series; Time series model;
D O I
10.1615/JAutomatInfScien.v51.i8.60
中图分类号
学科分类号
摘要
Determining the measure as a distance between time series is a starting point for many data mining tasks such as clustering and classification. Clustering is a main method of teaching without a teacher, which is used to divide data into groups based on the internal and a priori unknown characteristics inherent in the data. When dividing data into clusters, the need arises to select the similarity metric between objects. The paper describes the main existing algorithms for the “distance” searching between time series, which describe well this problem for small time series and under the absence of outliers. Outliers inherent in real processes lead to improper clustering, and, consequently, to wrong decisions making. It is proposed to consider the distance between time series in the form of the distance between models (ARIMA) of these time series. In the presence of a large number of outliers, classical methods linearly increase the distances between time series, while the distance proposed in the article according to the models behaves as a logarithmic function. It is shown that with an increase in the number of measurements, the relative errors for all classical methods remain almost unchanged. At the same time, the relative error for estimating the distance by the models is much smaller and decreases with an increase in the number of measurements. The main achievement of the article is the determination of the distance between time series, based on the concept of a model, and the comparison of this distance with the corresponding classical methods most commonly used. Using the Monte Carlo method, it has been shown that the proposed distance is more resistant to outliers and gives more accurate results for time series with a large number of observations. In addition, the complexity of the algorithm for calculating distances based on models is less than the analogous computational complexity of existing algorithms (DTW, ERP, Euclidean distance). There is no doubt that the use of models is one of the most convenient tools for studying the similarity of processes. In addition, for analysis taking into account this algorithm, it is convenient to use the averaged evolutions and the limiting evolutions in the diffusion approximation scheme. Also, due to the resistance to outliers of limiting evolutions, the entered distance can be used in clustering to build more noise-resistant clusters. © 2019 by Begell House Inc.
引用
收藏
页码:70 / 80
页数:10
相关论文
共 50 条
  • [21] Clustering of Time Series Based on Forecasting Performance of Global Models
    Lopez-Oriona, Angel
    Montero-Manso, Pablo
    Vilar, Jose A.
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2022, 2023, 13812 : 18 - 33
  • [22] A Novel Similarity Measurement and Clustering Framework for Time Series Based on Convolution Neural Networks
    Ding, Xin
    Hao, Kuangrong
    Cai, Xin
    Tang, Xue-Song
    Chen, Lei
    Zhang, Haichao
    IEEE ACCESS, 2020, 8 : 173158 - 173168
  • [23] Trend and Value based Time Series Representation for Similarity Search
    Kane, Aminata
    2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 252 - 259
  • [24] Research on shape-based time series similarity measure
    Dong, Xiao-Li
    Gu, Cheng-Kui
    Wang, Zheng-Ou
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1253 - +
  • [25] Piecewise statistic approximation based similarity measure for time series
    Cai, Qinglin
    Chen, Ling
    Sun, Jianling
    KNOWLEDGE-BASED SYSTEMS, 2015, 85 : 181 - 195
  • [26] Similarity Search on Financial Time Series based on DTW and NMF
    Liu, Zunxiong
    Zhou, Tianqing
    PROCEEDINGS OF 2010 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL ENGINEERING, VOLS I AND II, 2010, : 1112 - 1116
  • [27] Time Series Similarity Search based on Middle Points and Clipping
    Nguyen Thanh Son
    Duong Tuan Anh
    2011 3RD CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2011, : 13 - 19
  • [28] Time Series Similarity Measure Based on the Function of Degree of Disagreement
    Guo, Chonghui
    Zhang, Yanchang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 103 - 111
  • [29] An empirical evaluation of similarity measures for time series classification
    Serra, Joan
    Arcos, Josep Ll.
    KNOWLEDGE-BASED SYSTEMS, 2014, 67 : 305 - 314
  • [30] Similarity Measure Selection for Clustering Time Series Databases
    Mori, Usue
    Mendiburu, Alexander
    Lozano, Jose A.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 181 - 195