Time series representation and similarity based on local autopatterns

被引:1
作者
Mustafa Gokce Baydogan
George Runger
机构
[1] Boğaziçi University,Department of Industrial Engineering
[2] Arizona State University,School of Computing, Informatics and Decision Systems Engineering
来源
Data Mining and Knowledge Discovery | 2016年 / 30卷
关键词
Time series; Similarity; Pattern discovery; Autoregression; Regression tree;
D O I
暂无
中图分类号
学科分类号
摘要
Time series data mining has received much greater interest along with the increase in temporal data sets from different domains such as medicine, finance, multimedia, etc. Representations are important to reduce dimensionality and generate useful similarity measures. High-level representations such as Fourier transforms, wavelets, piecewise polynomial models, etc., were considered previously. Recently, autoregressive kernels were introduced to reflect the similarity of the time series. We introduce a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local autopatterns. Our approach generates a pattern-based representation along with a similarity measure called learned pattern similarity (LPS). A tree-based ensemble-learning strategy that is fast and insensitive to parameter settings is the basis for the approach. Then, a robust similarity measure based on the learned patterns is presented. This unsupervised approach to represent and measure the similarity between time series generally applies to a number of data mining tasks (e.g., clustering, anomaly detection, classification). Furthermore, an embedded learning of the representation avoids pre-defined features and an extraction step which is common in some feature-based approaches. The method generalizes in a straightforward manner to multivariate time series. The effectiveness of LPS is evaluated on time series classification problems from various domains. We compare LPS to eleven well-known similarity measures. Our experimental results show that LPS provides fast and competitive results on benchmark datasets from several domains. Furthermore, LPS provides a research direction and template approach that breaks from the linear dependency models to potentially foster other promising nonlinear approaches.
引用
收藏
页码:476 / 509
页数:33
相关论文
共 53 条
  • [1] Batista G(2014)Cid: an efficient complexity-invariant distance for time series Data Min Knowl Discov 28 634-669
  • [2] Keogh E(2013)A bag-of-features framework to classify time series IEEE Trans Pattern Anal Mach Intell 35 2796-2802
  • [3] Tataw O(2002)Locally adaptive dimensionality reduction for indexing large time series databases ACM Trans Database Syst 27 188-228
  • [4] de Souza V(1993)Interaction, nonlinearity, and multicollinearity: implications for multiple regression J Manag 19 915-922
  • [5] Baydogan MG(2006)Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 7 1-30
  • [6] Runger G(2008)Querying and mining of time series data: experimental comparison of representations and distance measures Proc VLDB Endow 1 1542-1552
  • [7] Tuv E(2011)A review on time series data mining Eng Appl Artif Intell 24 164-181
  • [8] Chakrabarti K(2006)Extremely randomized trees Mach Learn 63 3-42
  • [9] Keogh E(2014)Invariant time-series factorization Data Min Knowl Discov 28 1455-1479
  • [10] Mehrotra S(2014)Classification of time series by shapelet transformation Data Min Knowl Discov 28 851-881