Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:9
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [31] A novel clustering-based purity and distance imputation for handling medical data with missing values
    Cheng, Ching-Hsue
    Huang, Shu-Fen
    SOFT COMPUTING, 2021, 25 (17) : 11781 - 11801
  • [32] A case study to examine the imputation of missing data to improve clustering analysis of building electrical demand
    Inman, Daniel
    Elmore, Ryan
    Bush, Brian
    BUILDING SERVICES ENGINEERING RESEARCH & TECHNOLOGY, 2015, 36 (05) : 628 - 637
  • [33] Multiple imputation of censored survival data in the presence of missing covariates using restricted mean survival time
    Grover, Gurprit
    Gupta, Vinay K.
    JOURNAL OF APPLIED STATISTICS, 2015, 42 (04) : 817 - 827
  • [34] A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits
    Lee, MinJae
    Rahbar, Mohammad H.
    Brown, Matthew
    Gensler, Lianne
    Weisman, Michael
    Diekman, Laura
    Reveille, John D.
    BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [35] A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits
    MinJae Lee
    Mohammad H. Rahbar
    Matthew Brown
    Lianne Gensler
    Michael Weisman
    Laura Diekman
    John D. Reveille
    BMC Medical Research Methodology, 18
  • [36] Imputation Method Based on Collaborative Filtering and Clustering for the Missing Data of the Squeeze Casting Process Parameters
    Deng, Jianxin
    Ye, Zhixing
    Shan, Lubao
    You, Dongdong
    Liu, Guangming
    INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2022, 11 (01) : 95 - 108
  • [37] Investigating Parallel Analysis in the Context of Missing Data: A Simulation Study Comparing Six Missing Data Methods
    Goretzko, David
    Heumann, Christian
    Buehner, Markus
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2020, 80 (04) : 756 - 774
  • [38] Reference-based multiple imputation for missing data sensitivity analyses in trial-based cost-effectiveness analysis
    Leurent, Baptiste
    Gomes, Manuel
    Cro, Suzie
    Wiles, Nicola
    Carpenter, James R.
    HEALTH ECONOMICS, 2020, 29 (02) : 171 - 184
  • [39] Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data
    Ehrig, Molly
    Bullock, Garrett S.
    Leng, Xiaoyan Iris
    Pajewski, Nicholas M.
    Speiser, Jaime Lynn
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [40] Confidence Intervals for Mean and Difference between Means of Delta-Lognormal Distributions Based on Left-Censored Data
    Thangjai, Warisa
    Niwitpong, Sa-Aat
    SYMMETRY-BASEL, 2023, 15 (06):