Model-based clustering of time series in group-specific functional subspaces

被引:115
作者
Bouveyron, Charles [2 ]
Jacques, Julien [1 ]
机构
[1] Univ Lille 1, UFR Math, INRIA Lille Nord Europe, Lab Paul Painleve,UMR CNRS 8524, F-59655 Villeneuve Dascq, France
[2] Univ Paris 01, Lab SAMM, EA 4543, F-75013 Paris, France
关键词
Functional data; Time series clustering; Model-based clustering; Group-specific functional subspaces; Functional PCA; CLASSIFICATION;
D O I
10.1007/s11634-011-0095-6
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This work develops a general procedure for clustering functional data which adapts the clustering method high dimensional data clustering (HDDC), originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for determining both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical two-step clustering methods while providing useful interpretations of the groups and avoiding the uneasy choice of the discretization technique. In particular, funHDDC appears to always outperform HDDC applied on spline coefficients.
引用
收藏
页码:281 / 300
页数:20
相关论文
共 24 条
[1]   Using basis expansions for estimating functional PLS regression Applications with chemometric data [J].
Aguilera, Ana M. ;
Escabias, Manuel ;
Preda, Cristian ;
Saporta, Gilbert .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 104 (02) :289-305
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]   Initializing EM using the properties of its trajectories in Gaussian mixtures [J].
Biernacki, C .
STATISTICS AND COMPUTING, 2004, 14 (03) :267-279
[4]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[5]   SCREE TEST FOR NUMBER OF FACTORS [J].
CATTELL, RB .
MULTIVARIATE BEHAVIORAL RESEARCH, 1966, 1 (02) :245-276
[6]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[7]   DEFINING PROBABILITY DENSITY FOR A DISTRIBUTION OF RANDOM FUNCTIONS [J].
Delaigle, Aurore ;
Hall, Peter .
ANNALS OF STATISTICS, 2010, 38 (02) :1171-1193
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   Modeling environmental data by functional principal component logistic regression [J].
Escabias, M ;
Aguilera, AM ;
Valderrama, MJ .
ENVIRONMETRICS, 2005, 16 (01) :95-107
[10]  
Ferraty F., 2006, SPR S STAT