Fast and Accurate Time Series Classification with WEASEL

被引:151
作者
Schaefer, Patrick [1 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Berlin, Germany
来源
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2017年
关键词
Time series; classification; feature selection; bag-of-patterns; word co-occurrences;
D O I
10.1145/3132847.3132980
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both fast and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.
引用
收藏
页码:637 / 646
页数:10
相关论文
共 7 条
[1]  
[Anonymous], 2016, DATA MIN KNOWL DISC
[2]  
[Anonymous], 2012, P 2012 SIAM INT C DA
[3]   Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles [J].
Bagnall, Anthony ;
Lines, Jason ;
Hills, Jon ;
Bostrom, Aaron .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) :2522-2535
[4]   A Bag-of-Features Framework to Classify Time Series [J].
Baydogan, Mustafa Gokce ;
Runger, George ;
Tuv, Eugene .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) :2796-2802
[5]   Binary Shapelet Transform for Multiclass Time Series Classification [J].
Bostrom, Aaron ;
Bagnall, Anthony .
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 :257-269
[6]  
Bryc Wlodzimierz, 2012, The normal distribution: characterizations with applications, V100
[7]  
Webb G., 2017, SIAM SDM