Clustering Multivariate Time Series Using Hidden Markov Models

被引:72
作者
Ghassempour, Shima [1 ,2 ]
Girosi, Federico [2 ]
Maeder, Anthony [1 ,2 ]
机构
[1] Univ Western Sydney, Sch Comp Engn & Math, Campbelltown, NSW 2751, Australia
[2] Univ Western Sydney, Ctr Hlth Res, Campbelltown, NSW 2751, Australia
关键词
health trajectory; HMM; clustering; FRECHET DISTANCE;
D O I
10.3390/ijerph110302741
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.
引用
收藏
页码:2741 / 2763
页数:23
相关论文
共 51 条
[1]   COMPUTING THE FRECHET DISTANCE BETWEEN 2 POLYGONAL CURVES [J].
ALT, H ;
GODAU, M .
INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 1995, 5 (1-2) :75-91
[2]  
[Anonymous], 2009, FINDING GROUPS DATA
[3]  
[Anonymous], 2013, SIAM SDM, DOI DOI 10.1137/1.9781611972832.21
[4]  
[Anonymous], 2012, International Journal of Computer Applications
[5]  
[Anonymous], COMPUT RES REPOS
[6]  
[Anonymous], 2002, Latent Class Analysis
[7]  
[Anonymous], P IEEE ECCV PETS WOR
[8]  
[Anonymous], P 2010 IEEE 10 INT C
[9]  
[Anonymous], P EUR
[10]  
[Anonymous], P 7 INT C ISMB HEID