A Framework for Multiple Imputation in Cluster Analysis

被引:58
作者
Basagana, Xavier [1 ,2 ,3 ]
Barrera-Gomez, Jose [1 ,2 ,3 ]
Benet, Marta [1 ,2 ,3 ]
Anto, Josep M. [1 ,2 ,3 ,4 ]
Garcia-Aymerich, Judith [1 ,2 ,3 ,4 ]
机构
[1] Ctr Res Environm Epidemiol, Barcelona 08003, Catalonia, Spain
[2] Hosp del Mar, Res Inst, Barcelona, Spain
[3] CIBERESP, Barcelona, Spain
[4] Univ Pompeu Fabra, Fac Hlth & Life Sci, Dept Expt & Hlth Sci, Barcelona, Spain
关键词
classification; cluster analysis; imputation; missing data; FULLY CONDITIONAL SPECIFICATION;
D O I
10.1093/aje/kws289
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation is a common technique for dealing with missing values and is mostly applied in regression settings. Its application in cluster analysis problems, where the main objective is to classify individuals into homogenous groups, involves several difficulties which are not well characterized in the current literature. In this paper, we propose a framework for applying multiple imputation to cluster analysis when the original data contain missing values. The proposed framework incorporates the selection of the final number of clusters and a variable reduction procedure, which may be needed in data sets where the ratio of the number of persons to the number of variables is small. We suggest some ways to report how the uncertainty due to multiple imputation of missing data affects the cluster analysis outcomes namely the final number of clusters, the results of a variable selection procedure (if applied), and the assignment of individuals to clusters. The proposed framework is illustrated with data from the Phenotype and Course of Chronic Obstructive Pulmonary Disease (PAC-COPD) Study (Spain, 2004-2008), which aimed to classify patients with chronic obstructive pulmonary disease into different disease subtypes.
引用
收藏
页码:718 / 725
页数:8
相关论文
共 27 条
  • [1] A unifying criterion for unsupervised clustering and feature selection
    Breaban, Mihaela
    Luchian, Henri
    [J]. PATTERN RECOGNITION, 2011, 44 (04) : 854 - 865
  • [2] SCREE TEST FOR NUMBER OF FACTORS
    CATTELL, RB
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1966, 1 (02) : 245 - 276
  • [3] Review: A gentle introduction to imputation of missing values
    Donders, A. Rogier T.
    van der Heijden, Geert J. M. G.
    Stijnen, Theo
    Moons, Karel G. M.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) : 1087 - 1091
  • [4] Identification and prospective validation of clinically relevant chronic obstructive pulmonary disease (COPD) subtypes
    Garcia-Aymerich, Judith
    Gomez, Federico P.
    Benet, Marta
    Farrero, Eva
    Basagana, Xavier
    Gayete, Angel
    Pare, Carles
    Freixa, Xavier
    Ferrer, Jaume
    Ferrer, Antoni
    Roca, Josep
    Galdiz, Juan B.
    Sauleda, Jaume
    Monso, Eduard
    Gea, Joaquim
    Barbera, Joan A.
    Agusti, Alvar
    Anto, Josep M.
    [J]. THORAX, 2011, 66 (05) : 430 - 437
  • [5] Multiple imputation in a large-scale complex survey: a practical guide
    He, Y.
    Zaslavsky, A. M.
    Landrum, M. B.
    Harrington, D. P.
    Catalano, P.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2010, 19 (06) : 653 - 670
  • [6] Variable selection under multiple imputation using the bootstrap in a prognostic study
    Heymans, Martijn W.
    van Buuren, Stef
    Knol, Dirk L.
    van Mechelen, Willem
    de Vet, Henrica C. W.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2007, 7 (1)
  • [7] Landscape of clustering algorithms
    Jain, AK
    Topchy, A
    Law, MHC
    Buhmann, JM
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, : 260 - 263
  • [8] Statistical pattern recognition: A review
    Jain, AK
    Duin, RPW
    Mao, JC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (01) : 4 - 37
  • [9] Use of multiple imputation in the epidemiologic literature
    Klebanoff, Mark A.
    Cole, Stephen R.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2008, 168 (04) : 355 - 357
  • [10] PROC LCA: A SAS procedure for latent class analysis
    Lanza, Stephanie T.
    Collins, Linda M.
    Lemmon, David R.
    Schafer, Joseph L.
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2007, 14 (04) : 671 - 694