Highly Comparative Feature-Based Time-Series Classification

被引:205
作者
Fulcher, Ben D. [1 ]
Jones, Nick S. [2 ]
机构
[1] Univ Oxford, Dept Phys, Clarendon Lab, Oxford OX1 3PU, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Math, London, England
基金
英国工程与自然科学研究理事会;
关键词
Time-series analysis; classification; data mining; PATTERN-RECOGNITION; REPRESENTATION; SELECTION;
D O I
10.1109/TKDE.2014.2316504
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large data sets containing long time series or time series of different lengths. For many of the data sets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the data set, insight that can guide further scientific investigation.
引用
收藏
页码:3026 / 3037
页数:12
相关论文
共 42 条
[1]  
[Anonymous], 33 PHIL U MARB
[2]  
Batista G. E., 2011, P 2011 SIAM INT C DA, P699, DOI DOI 10.1137/1.9781611972818.60
[3]  
Berndt DJ., 1994, USING DYNAMIC TIME W, DOI DOI 10.5555/3000850.3000887
[4]   Locally adaptive dimensionality reduction for indexing large time series databases [J].
Chakrabarti, K ;
Keogh, E ;
Mehrotra, S ;
Pazzani, M .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :188-228
[5]   A time series forest for classification and feature extraction [J].
Deng, Houtao ;
Runger, George ;
Tuv, Eugene ;
Vladimir, Martyanov .
INFORMATION SCIENCES, 2013, 239 :142-153
[6]  
Ding H, 2008, PROC VLDB ENDOW, V1, P1542
[7]   Genetic algorithms and support vector machines for time series classification [J].
Eads, D ;
Hill, D ;
Davis, S ;
Perkins, S ;
Ma, JS ;
Porter, R ;
Theiler, J .
APPLICATIONS AND SCIENCE OF NEURAL NETWORKS, FUZZY SYSTEMS, AND EVOLUTIONARY COMPUTATION V, 2002, 4787 :74-85
[8]   Highly comparative time-series analysis: the empirical structure of time series and their methods [J].
Fulcher, Ben D. ;
Little, Max A. ;
Jones, Nick S. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2013, 10 (83)
[9]  
Gan G., 2007, DATA CLUSTERING THEO
[10]  
Gandhi A., 2002, THESIS OREGON STATE