Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:9
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [41] Multiple imputation for systematically missing confounders within a distributed data drug safety network: A simulation study and real-world example
    Secrest, Matthew H.
    Platt, Robert W.
    Reynier, Pauline
    Dormuth, Colin R.
    Benedetti, Andrea
    Filion, Kristian B.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 35 - 44
  • [42] An Unsupervised Data-Mining and Generative-Based Multiple Missing Data Imputation Network for Energy Dataset
    Kim, Hyung Joon
    Kim, Mun Kyeom
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (11) : 13429 - 13440
  • [43] Model averaging, missing data and multiple imputation: a case study for behavioural ecology
    Nakagawa, Shinichi
    Freckleton, Robert P.
    BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY, 2011, 65 (01) : 103 - 116
  • [44] Confidence intervals for ratio of means of delta-lognormal distributions based on left-censored data with application to rainfall data in Thailand
    Thangjai, Warisa
    Niwitpong, Sa-Aat
    PEERJ, 2023, 11
  • [45] Estimation of concentration ratio of indicator to pathogen-related gene in environmental water based on left-censored data
    Kato, Tsuyoshi
    Kobayashi, Ayano
    Ito, Toshihiro
    Miura, Takayuki
    Ishii, Satoshi
    Okabe, Satoshi
    Sano, Daisuke
    JOURNAL OF WATER AND HEALTH, 2016, 14 (01) : 14 - 25
  • [46] Multiple imputation using chained equations for missing data in TIMSS: a case study
    Bouhlila D.S.
    Sellaouti F.
    Large-scale Assessments in Education, 1 (1)
  • [47] Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
    Sisk, Rose
    Sperrin, Matthew
    Peek, Niels
    van Smeden, Maarten
    Martin, Glen Philip
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (08) : 1461 - 1477
  • [48] A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis
    Jahangiri, Mina
    Kazemnejad, Anoshirvan
    Goldfeld, Keith S.
    Daneshpour, Maryam S.
    Mostafaei, Shayan
    Khalili, Davood
    Moghadas, Mohammad Reza
    Akbarzadeh, Mahdi
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [49] Imputation Method Based on Collaborative Filtering and Clustering for the Missing Data of the Squeeze Casting Process Parameters
    Jianxin Deng
    Zhixing Ye
    Lubao Shan
    Dongdong You
    Guangming Liu
    Integrating Materials and Manufacturing Innovation, 2022, 11 : 95 - 108
  • [50] Performance of clustering procedures for grouping germplasms based on mixture data with missing observations
    Sarkar, Rupam Kumar
    Rao, A. R.
    Wahi, S. D.
    Bhat, K. V.
    INDIAN JOURNAL OF AGRICULTURAL SCIENCES, 2012, 82 (12): : 1055 - 1058