Comparison of machine learning clustering algorithms for detecting heterogeneity of treatment effect in acute respiratory distress syndrome: A secondary analysis of three randomised controlled trials

被引:28
作者
Sinha, Pratik [1 ]
Spicer, Alexandra [2 ]
Delucchi, Kevin L. [3 ,4 ]
McAuley, Daniel F. [5 ,6 ]
Calfee, Carolyn S. [4 ,7 ,8 ]
Churpek, Matthew M. [2 ]
机构
[1] Washington Univ, Sch Med, Div Clin & Translat Res, Div Crit Care,Dept Anesthesia, St Louis, MO 63110 USA
[2] Univ Wisconsin Madison, Dept Med, Madison, WI USA
[3] Dept Psychiat & Behav Sci, San Francisco, CA USA
[4] Univ Calif San Francisco, San Francisco, CA 94143 USA
[5] Queens Univ Belfast, Wellcome Wolfson Inst Expt Med, Belfast, Antrim, North Ireland
[6] Queens Univ Belfast, Reg Intens Care Unit, Wellcome Wolfson Inst Expt Med, Royal Victoria Hosp, Belfast, Antrim, North Ireland
[7] Dept Med, Div Pulm Crit Care Allergy & Sleep Med, San Francisco, CA USA
[8] Dept Anesthesia, San Francisco, CA USA
来源
EBIOMEDICINE | 2021年 / 74卷
关键词
ARDS; RCTs; Clustering; machine learning; LCA; Heterogeneity of treatment effect; LATENT CLASS ANALYSIS; CLINICAL-TRIALS; SUBPHENOTYPES; SEPSIS;
D O I
10.1016/j.ebiom.2021.103697
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: Heterogeneity in Acute Respiratory Distress Syndrome (ARDS), as a consequence of its non-specific definition, has led to a multitude of negative randomised controlled trials (RCTs). Investigators have sought to identify heterogeneity of treatment effect (HTE) in RCTs using clustering algorithms. We evaluated the proficiency of several commonly-used machine-learning algorithms to identify clusters where HTE may be detected. Methods: Five unsupervised: Latent class analysis (LCA), K-means, partition around medoids, hierarchical, and spectral clustering; and four supervised algorithms: model-based recursive partitioning, Causal Forest (CF), and X-learner with Random Forest (XL-RF) and Bayesian Additive Regression Trees were individually applied to three prior ARDS RCTs. Clinical data and research protein biomarkers were used as partitioning variables, with the latter excluded for secondary analyses. For a clustering schema, HTE was evaluated based on the interaction term of treatment group and cluster with day-90 mortality as the dependent variable. Findings: No single algorithm identified clusters with significant HTE in all three trials. LCA, XL-RF, and CF identified HTE most frequently (2/3 RCTs). Important partitioning variables in the unsupervised approaches were consistent across algorithms and RCTs. In supervised models, important partitioning variables varied between algorithms and across RCTs. In algorithms where clusters demonstrated HTE in the same trial, patients frequently interchanged clusters from treatment-benefit to treatment-harm clusters across algorithms. LCA aside, results from all other algorithms were subject to significant alteration in cluster composition and HTE with random seed change. Removing research biomarkers as partitioning variables greatly reduced the chances of detecting HTE across all algorithms. Interpretation: Machine-learning algorithms were inconsistent in their abilities to identify clusters with significant HTE. Protein biomarkers were essential in identifying clusters with HTE. Investigations using machine-learning approaches to identify clusters to seek HTE require cautious interpretation. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页数:9
相关论文
共 29 条
  • [1] A roadmap of clustering algorithms: finding a match for a biomedical application
    Andreopoulos, Bill
    An, Aijun
    Wang, Xiaogang
    Schroeder, Michael
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (03) : 297 - 314
  • [2] Recursive partitioning for heterogeneous causal effects
    Athey, Susan
    Imbens, Guido
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (27) : 7353 - 7360
  • [3] Brower RG, 2004, NEW ENGL J MED, V351, P327
  • [4] Three simple rules to ensure reasonably credible subgroup analyses
    Burke, James F.
    Sussman, Jeremy B.
    Kent, David M.
    Hayward, Rodney A.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2015, 351
  • [5] Acute respiratory distress syndrome subphenotypes and differential response to simvastatin: secondary analysis of a randomised controlled trial
    Calfee, Carolyn S.
    Delucchi, Kevin L.
    Sinha, Pratik
    Matthay, Michael A.
    Hackett, Jonathan
    Shankar-Hari, Manu
    McDowell, Cliona
    Laffey, John G.
    O'Kane, Cecilia M.
    McAuley, Daniel F.
    [J]. LANCET RESPIRATORY MEDICINE, 2018, 6 (09) : 691 - 698
  • [6] Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials
    Calfee, Carolyn S.
    Delucchi, Kevin
    Parsons, Polly E.
    Thompson, B. Taylor
    Ware, Lorraine B.
    Matthay, Michael A.
    [J]. LANCET RESPIRATORY MEDICINE, 2014, 2 (08) : 611 - 620
  • [7] Acute Respiratory Distress Syndrome Subphenotypes Respond Differently to Randomized Fluid Management Strategy
    Famous, Katie R.
    Delucchi, Kevin
    Ware, Lorraine B.
    Kangelaris, Kirsten N.
    Liu, Kathleen D.
    Thompson, B. Taylor
    Calfee, Carolyn S.
    [J]. AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2017, 195 (03) : 331 - 338
  • [8] Feng Shi, 2018, EMNLP
  • [9] Adapting bioinformatics curricula for big data
    Greene, Anna C.
    Giffin, Kristine A.
    Greene, Casey S.
    Moore, Jason H.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) : 43 - 50
  • [10] Limitations of applying summary results of clinical trials to individual patients - The need for risk stratification
    Kent, David M.
    Hayward, Rodney A.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2007, 298 (10): : 1209 - 1212