Optimizing dynamic time warping’s window width for time series data mining applications

被引:0
作者
Hoang Anh Dau
Diego Furtado Silva
François Petitjean
Germain Forestier
Anthony Bagnall
Abdullah Mueen
Eamonn Keogh
机构
[1] University of California,
[2] Riverside,undefined
[3] Universidade Federal de São Carlos,undefined
[4] Monash University,undefined
[5] University of Haute-Alsace,undefined
[6] University of East Anglia,undefined
[7] University of New Mexico,undefined
来源
Data Mining and Knowledge Discovery | 2018年 / 32卷
关键词
Time series; Clustering; Classification; Dynamic time warping; Semi-supervised learning;
D O I
暂无
中图分类号
学科分类号
摘要
Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that “the larger the better”, and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a naïve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW’s warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated “low hanging fruit” in optimizing DTW’s performance can produce improvements that make it an even stronger baseline, closing most or all the improvement gap of the more sophisticated methods proposed in recent years.
引用
收藏
页码:1074 / 1120
页数:46
相关论文
共 107 条
[1]  
Albert MV(2012)Fall classification by machine learning using mobile phones PLoS ONE 7 e36556-31
[2]  
Kording K(2006)Adaptable distance functions for similarity-based multimedia retrieval Datenbank Spektrum 19 23-660
[3]  
Herrmann M(2017)The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances Data Min Knowl Discov 31 606-29
[4]  
Jayaraman A(2004)A probabilistic framework for semi-supervised clustering Int Conf Knowl Discov Data Min (KDD) 6 20-2822
[5]  
Assent I(2004)A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explor Newsl Spec Issue Learn Imbalanced Datasets 25 2809-357
[6]  
Wichterich M(2013)Integrated oversampling for imbalanced time series classification IEEE Trans Knowl Data Eng 16 321-153
[7]  
Seidl T(2002)SMOTE: synthetic minority over-sampling technique J Artif Intell Res 239 142-1552
[8]  
Bagnall A(2013)A time series forest for classification and feature extraction Inf Sci 1 1542-484
[9]  
Lines J(2008)Querying and mining of time series data: experimental comparison of representations and distance measures Proc VLDB Endow 8 473-242
[10]  
Bostrom A(2015)YADING: fast clustering of large-scale time series data VLDB Endow 326 227-331