Estimate of time series similarity based on models

被引:2
作者
Knignitskaya T.V. [1 ]
机构
[1] Yuriy Fedkovych Chernovtsy National University, Chernovtsy
关键词
Cluster; Clustering; Distance between time series by models; DTW; ERP; Time series; Time series model;
D O I
10.1615/JAutomatInfScien.v51.i8.60
中图分类号
学科分类号
摘要
Determining the measure as a distance between time series is a starting point for many data mining tasks such as clustering and classification. Clustering is a main method of teaching without a teacher, which is used to divide data into groups based on the internal and a priori unknown characteristics inherent in the data. When dividing data into clusters, the need arises to select the similarity metric between objects. The paper describes the main existing algorithms for the “distance” searching between time series, which describe well this problem for small time series and under the absence of outliers. Outliers inherent in real processes lead to improper clustering, and, consequently, to wrong decisions making. It is proposed to consider the distance between time series in the form of the distance between models (ARIMA) of these time series. In the presence of a large number of outliers, classical methods linearly increase the distances between time series, while the distance proposed in the article according to the models behaves as a logarithmic function. It is shown that with an increase in the number of measurements, the relative errors for all classical methods remain almost unchanged. At the same time, the relative error for estimating the distance by the models is much smaller and decreases with an increase in the number of measurements. The main achievement of the article is the determination of the distance between time series, based on the concept of a model, and the comparison of this distance with the corresponding classical methods most commonly used. Using the Monte Carlo method, it has been shown that the proposed distance is more resistant to outliers and gives more accurate results for time series with a large number of observations. In addition, the complexity of the algorithm for calculating distances based on models is less than the analogous computational complexity of existing algorithms (DTW, ERP, Euclidean distance). There is no doubt that the use of models is one of the most convenient tools for studying the similarity of processes. In addition, for analysis taking into account this algorithm, it is convenient to use the averaged evolutions and the limiting evolutions in the diffusion approximation scheme. Also, due to the resistance to outliers of limiting evolutions, the entered distance can be used in clustering to build more noise-resistant clusters. © 2019 by Begell House Inc.
引用
收藏
页码:70 / 80
页数:10
相关论文
共 50 条
  • [41] Wind Power Forecasting Algorithm Based on Similarity of Multivariate Time Series
    Jin, Hui-Ying
    Yang, Yong-Qiang
    Wang, Zhan-Feng
    Ma, Wei-Jun
    Su, Yong
    Pan, Yun-Peng
    INTERNATIONAL CONFERENCE ON ENERGY DEVELOPMENT AND ENVIRONMENTAL PROTECTION (EDEP 2017), 2017, 168 : 77 - 84
  • [42] Subseries Join: A Similarity-Based Time Series Match Approach
    Lin, Yi
    McCool, Michael D.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I, PROCEEDINGS, 2010, 6118 : 238 - +
  • [43] Nonlinear Correlation Analysis of Time Series Based on Complex Network Similarity
    Nie, Chun-Xiao
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2020, 30 (15):
  • [44] FSMBO: Fast Time Series Similarity Matching Based on Bit Operation
    Lu, Kaifu
    Lin, Shukuan
    Qiao, Jianzhong
    Yu, Ge
    Liu, Hualei
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1866 - 1871
  • [45] Similarity Measure of Time Series Based on Siamese and Sequential Neural Networks
    Li, Jiangeng
    Xu, Changjian
    Zhang, Ting
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 6408 - 6413
  • [46] Multiple models adaptive control based on time series for a class of nonlinear systems
    Huang, Miao
    Wang, Xin
    Wang, Zhen-Lei
    Zidonghua Xuebao/Acta Automatica Sinica, 2013, 39 (05): : 581 - 586
  • [47] A Shape Based Similarity Measure for Time Series Classification with Weighted Dynamic Time Warping Algorithm
    Ye, Yanqing
    Niu, Caiyun
    Jiang, Jiang
    Ge, Bingfeng
    Yang, Kewei
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 104 - 109
  • [48] Performance analysis for similarity data fusion model for enabling time series indexing in internet of things applications
    Younan M.
    Houssein E.H.
    Elhoseny M.
    Ali A.E.-M.
    PeerJ Computer Science, 2021, 7 : 1 - 18
  • [49] Performance analysis for similarity data fusion model for enabling time series indexing in internet of things applications
    Younan, Mina
    Houssein, Essam H.
    Elhoseny, Mohamed
    Ali, Abd El-mageid
    PEERJ COMPUTER SCIENCE, 2021,
  • [50] A Novel Method Based on Data Visual Autoencoding for Time Series Similarity Matching
    Qian, Chen
    Wang, Yan
    Hu, Gang
    Guo, Lei
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 2551 - 2555