Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:9
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [21] Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study
    Sperrin, Matthew
    Martin, Glen P.
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [22] Imputation method for missing data based on clustering and measure of property
    Kim, Sunghyun
    Kim, Dongjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2018, 31 (01) : 29 - 40
  • [23] Multiple imputation based on restricted mean model for censored data
    Liu, Lyrica Xiaohong
    Murray, Susan
    Tsodikov, Alex
    STATISTICS IN MEDICINE, 2011, 30 (12) : 1339 - 1350
  • [24] A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data
    Donneau, A. F.
    Mauer, M.
    Molenberghs, G.
    Albert, A.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (05) : 1311 - 1338
  • [25] Latent class based multiple imputation approach for missing categorical data
    Gebregziabher, Mulugeta
    DeSantis, Stacia M.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3252 - 3262
  • [26] Multiple imputation for demographic hazard models with left-censored predictor variables: Application to employment duration and fertility in the EU-SILC
    Rendall, Michael S.
    Greulich, Angela
    DEMOGRAPHIC RESEARCH, 2016, 35 : 1135 - 1148
  • [27] Comparing multiple imputation methods for systematically missing subject-level data
    Kline, David
    Andridge, Rebecca
    Kaizar, Eloise
    RESEARCH SYNTHESIS METHODS, 2017, 8 (02) : 136 - 148
  • [28] Multiple imputation method of missing credit risk assessment data based on generative adversarial networks
    Zhao, Feng
    Lu, Yan
    Li, Xinning
    Wang, Lina
    Song, Yingjie
    Fan, Deming
    Zhang, Caiming
    Chen, Xiaobo
    APPLIED SOFT COMPUTING, 2022, 126
  • [29] Bayesian Random Forest with Multiple Imputation by Chain Equations for High-Dimensional Missing Data: A Simulation Study
    Olaniran, Oyebayo Ridwan
    Alzahrani, Ali Rashash R.
    MATHEMATICS, 2025, 13 (06)
  • [30] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714