KmL: k-means for longitudinal data

被引:175
作者
Genolini, Christophe [1 ,2 ]
Falissard, Bruno [1 ,3 ,4 ,5 ]
机构
[1] INSERM, U669, Paris, France
[2] Univ Paris Ouest Nanterre Def, Paris, France
[3] Univ Paris Sud, Paris, France
[4] Univ Paris 05, UMR S0669, Paris, France
[5] Hop Paul Brousse, AP HP, Dept Sante Publ, Villejuif, France
关键词
Functional analysis; Longitudinal data; k-means; Cluster analysis; Non-parametric algorithm; CLUSTERING ALGORITHMS; SAS PROCEDURE; TRAJECTORIES; MODELS; NUMBER;
D O I
10.1007/s00180-009-0178-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cohort studies are becoming essential tools in epidemiological research. In these studies, measurements are not restricted to single variables but can be seen as trajectories. Statistical methods used to determine homogeneous patient trajectories can be separated into two families: model-based methods (like Proc Traj) and partitional clustering (non-parametric algorithms like k-means). KmL is a new implementation of k-means designed to work specifically on longitudinal data. It provides scope for dealing with missing values and runs the algorithm several times, varying the starting conditions and/or the number of clusters sought; its graphical interface helps the user to choose the appropriate number of clusters when the classic criterion is not efficient. To check KmL efficiency, we compare its performances to Proc Traj both on artificial and real data. The two techniques give very close clustering when trajectories follow polynomial curves. KmL gives much better results on non-polynomial trajectories.
引用
收藏
页码:317 / 328
页数:12
相关论文
共 48 条
[1]   Unsupervised curve clustering using B-splines [J].
Abraham, C ;
Cornillon, PA ;
Matzner-Lober, E ;
Molinari, N .
SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (03) :581-595
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]  
[Anonymous], 2008, PREVENIR VIOLENCE PE
[4]   An application of mixture distributions in modelization of length of hospital stay [J].
Atienza, N. ;
Garcia-Heras, J. ;
Munoz-Pichardo, J. M. ;
Villa-Caro, R. .
STATISTICS IN MEDICINE, 2008, 27 (09) :1403-1420
[5]   A comparison of maximum covariance and k-means cluster analysis in classifying cases into known taxon groups [J].
Beauchaine, TP ;
Beauchaine, RJ .
PSYCHOLOGICAL METHODS, 2002, 7 (02) :245-261
[6]   Some new indexes of cluster validity [J].
Bezdek, JC ;
Pal, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :301-315
[7]   Quantifying synergism/antagonism using nonlinear mixed-effects modeling: A simulation study [J].
Boik, John C. ;
Newman, Robert A. ;
Boik, Robert J. .
STATISTICS IN MEDICINE, 2008, 27 (07) :1040-1061
[8]  
Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[9]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[10]   Substance use disorder trajectory classes: Diachronic integration of onset age, severity, and course [J].
Clark, Duncan B. ;
Jones, Bobby L. ;
Wood, D. Scott ;
Cornelius, Jack R. .
ADDICTIVE BEHAVIORS, 2006, 31 (06) :995-1009