Estimate of time series similarity based on models

被引:2
作者
Knignitskaya T.V. [1 ]
机构
[1] Yuriy Fedkovych Chernovtsy National University, Chernovtsy
关键词
Cluster; Clustering; Distance between time series by models; DTW; ERP; Time series; Time series model;
D O I
10.1615/JAutomatInfScien.v51.i8.60
中图分类号
学科分类号
摘要
Determining the measure as a distance between time series is a starting point for many data mining tasks such as clustering and classification. Clustering is a main method of teaching without a teacher, which is used to divide data into groups based on the internal and a priori unknown characteristics inherent in the data. When dividing data into clusters, the need arises to select the similarity metric between objects. The paper describes the main existing algorithms for the “distance” searching between time series, which describe well this problem for small time series and under the absence of outliers. Outliers inherent in real processes lead to improper clustering, and, consequently, to wrong decisions making. It is proposed to consider the distance between time series in the form of the distance between models (ARIMA) of these time series. In the presence of a large number of outliers, classical methods linearly increase the distances between time series, while the distance proposed in the article according to the models behaves as a logarithmic function. It is shown that with an increase in the number of measurements, the relative errors for all classical methods remain almost unchanged. At the same time, the relative error for estimating the distance by the models is much smaller and decreases with an increase in the number of measurements. The main achievement of the article is the determination of the distance between time series, based on the concept of a model, and the comparison of this distance with the corresponding classical methods most commonly used. Using the Monte Carlo method, it has been shown that the proposed distance is more resistant to outliers and gives more accurate results for time series with a large number of observations. In addition, the complexity of the algorithm for calculating distances based on models is less than the analogous computational complexity of existing algorithms (DTW, ERP, Euclidean distance). There is no doubt that the use of models is one of the most convenient tools for studying the similarity of processes. In addition, for analysis taking into account this algorithm, it is convenient to use the averaged evolutions and the limiting evolutions in the diffusion approximation scheme. Also, due to the resistance to outliers of limiting evolutions, the entered distance can be used in clustering to build more noise-resistant clusters. © 2019 by Begell House Inc.
引用
收藏
页码:70 / 80
页数:10
相关论文
共 50 条
  • [31] Energy Time Series Forecasting Based on Pattern Sequence Similarity
    Martinez-Alvarez, Francisco
    Troncoso, Alicia
    Riquelme, Jose C.
    Aguilar-Ruiz, Jesus S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1230 - 1243
  • [32] A fast and accurate similarity measure for long time series classification based on local extrema and dynamic time warping
    Lahreche, Abdelmadjid
    Boucheham, Bachir
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [33] An Algorithm Based on Time Series Similarity Measurement for Missing Data Filling
    Li Hui-min
    Wang Pu
    Fang Li-ying
    Liu Jing-wei
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 3933 - 3935
  • [34] Unsupervised Similarity-based Sensor Selection for Time Series Data
    Almarri, Badar
    Rajasekaran, Sanguthevar
    Huang, Chun-Hsi
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 395 - 400
  • [35] Clustering Algorithm Based on Time Series Similarity to Web Data Clustering
    Yang Yan
    Yao Hua-Xiong
    Li Rong
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 1373 - 1377
  • [36] Approach to Analyze Time Series Similarity Pattern Mining Based on Haar
    Yi, Li
    INTELLIGENCE COMPUTATION AND EVOLUTIONARY COMPUTATION, 2013, 180 : 287 - 296
  • [37] A Novel Similarity Measure Approach for Time Series based on PLA and DTW
    Shen Jingyi
    Zhu Dongyang
    Huang Weiping
    Liang Jun
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7159 - 7163
  • [38] Analysis of Time Series Similarity Pattern Mining Method Based on HAAR
    Li, Yi
    2010 INTERNATIONAL CONFERENCE ON INFORMATION, ELECTRONIC AND COMPUTER SCIENCE, VOLS 1-3, 2010, : 321 - 324
  • [39] An effective similarity measure algorithm for time series based on key points
    Liu, Quan
    Li, Shihua
    Fang, Yilin
    Long, Tao
    Cao, Jiangyong
    Liu, Huan
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2, 2016, : 17 - 20
  • [40] A Trend Based Similarity Calculation Approach for Mining Time Series Data
    Yang, Yuhang
    Xia, Yingju
    Ge, Fujiang
    Meng, Yao
    Yu, Hao
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 461 - 464