Classification of multivariate time series via temporal abstraction and time intervals mining

被引:76
作者
Moskovitch, Robert [1 ,2 ]
Shahar, Yuval [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Informat Syst Engn, Beer Sheva, Israel
[2] Columbia Univ, Dept Biomed Informat Syst Biol & Med, New York, NY USA
关键词
Temporal knowledge discovery; Temporal abstraction; Time intervals mining; Frequent pattern mining; Classification; KNOWLEDGE DISCOVERY;
D O I
10.1007/s10115-014-0784-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of multivariate time series data, often including both time points and intervals at variable frequencies, is a challenging task. We introduce the KarmaLegoSification (KLS) framework for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals; (2) mining these symbolic time intervals to discover frequent time-interval-related patterns (TIRPs), using Allen's temporal relations; and (3) using the TIRPs as features to induce a classifier. To efficiently detect multiple TIRPs (features) in a single entity to be classified, we introduce a new algorithm, SingleKarmaLego, which can be shown to be superior for that purpose over a Sequential TIRPs Detection algorithm. We evaluated the KLS framework on datasets in the domains of diabetes, intensive care, and infectious hepatitis, assessing the effects of the various settings of the KLS framework. Discretization using Symbolic Aggregate approXimation (SAX) led to better performance than using the equal-width discretization (EWD); knowledge-based cut-off definitions when available were superior to both. Using three abstract temporal relations was superior to using the seven core temporal relations. Using an epsilon value larger than zero tended to result in a slightly better accuracy when using the SAX discretization method, but resulted in a reduced accuracy when using EWD, and overall, does not seem beneficial. No feature selection method we tried proved useful. Regarding feature (TIRP) representation, mean duration performed better than horizontal support, which in turn performed better than the default Binary (existence) representation method.
引用
收藏
页码:35 / 74
页数:40
相关论文
共 39 条
[1]   MAINTAINING KNOWLEDGE ABOUT TEMPORAL INTERVALS [J].
ALLEN, JF .
COMMUNICATIONS OF THE ACM, 1983, 26 (11) :832-843
[2]  
Batal I., 2012, ACM T INTELLIGENT SY
[3]  
Batal I., 2012, P KNOWL DISC DAT MIN
[4]  
Hoppner F., 2001, IJCAI WORKSH LEARN T, V25, P25
[5]  
Hoppner F, 2002, TIM SER ABSTR METH S
[6]  
Hu Bing, 2013, P SIAM DAT MIN
[7]  
Kam P. S., 2000, P DAWAK 00
[8]  
Lin J., 2003, 8 ACM SIGMOD DMKD WO
[9]  
Moerchen F, 2010, P SIAM DAT MIN
[10]  
Moerchen F, 2006, WORKSH TEMP DAT MIN