Random forests for the analysis of matched case-control studies

被引:0
|
作者
Schauberger, Gunther [1 ]
Klug, Stefanie J. [1 ]
Berger, Moritz [2 ]
机构
[1] Tech Univ Munich, Chair Epidemiol, TUM Sch Med & Hlth, Munich, Germany
[2] Univ Bonn, Fac Med, Inst Med Biometry Informat & Epidemiol, Bonn, Germany
来源
BMC BIOINFORMATICS | 2024年 / 25卷 / 01期
关键词
Conditional logistic regression; Conditional logistic regression forests; Matched case-control studies; Machine learning; Random forest; CLogitForest; RISK;
D O I
10.1186/s12859-024-05877-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundConditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case-control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case-control studies are missing because conventional machine learning methods cannot handle the matched structure of the data.ResultsA random forest method for the analysis of matched case-control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case-control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer.ConclusionsThe proposed random forest method is a promising add-on to the toolbox for the analysis of matched case-control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Matched Versus Unmatched Analysis of Matched Case-Control Studies
    Wan, Fei
    Colditz, Graham A.
    Sutcliffe, Siobhan
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2021, 190 (09) : 1859 - 1866
  • [2] Bayesian Variable Selection Methods for Matched Case-Control Studies
    Asafu-Adjei, Josephine
    Tadesse, Mahlet G.
    Coull, Brent
    Balasubramanian, Raji
    Lev, Michael
    Schwamm, Lee
    Betensky, Rebecca
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01)
  • [3] A tree-based modeling approach for matched case-control studies
    Schauberger, Gunther
    Tanaka, Luana Fiengo
    Berger, Moritz
    STATISTICS IN MEDICINE, 2023, 42 (05) : 676 - 692
  • [4] Matched case control studies with random exposure effects
    Chowdhury, SR
    McGilchrist, CA
    BIOMETRICAL JOURNAL, 2001, 43 (03) : 271 - 281
  • [5] Pooled Exposure Assessment for Matched Case-control Studies
    Saha-Chaudhuri, Paramita
    Umbach, David M.
    Weinberg, Clarice R.
    EPIDEMIOLOGY, 2011, 22 (05) : 704 - 712
  • [6] Variable importance in matched case-control studies in settings of high dimensional data
    Balasubramanian, Raji
    Houseman, E. Andres
    Coull, Brent A.
    Lev, Michael H.
    Schwamm, Lee H.
    Betensky, Rebecca A.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2014, 63 (04) : 639 - 655
  • [7] A flexible matching strategy for matched nested case-control studies
    Ratanatharathorn, Andrew
    Mooney, Stephen J.
    Rybicki, Benjamin A.
    Rundle, Andrew G.
    ANNALS OF EPIDEMIOLOGY, 2023, 86 : 49 - +
  • [8] Bayesian analysis of pair-matched case-control studies subject to outcome misclassification
    Hogg, Tanja
    Petkau, John
    Zhao, Yinshan
    Gustafson, Paul
    Wijnands, Jose M. A.
    Tremlett, Helen
    STATISTICS IN MEDICINE, 2017, 36 (26) : 4196 - 4213
  • [9] Covariate measurement error adjustment for matched case-control studies
    McShane, LM
    Midthune, DN
    Dorgan, JF
    Freedman, LS
    Carroll, RJ
    BIOMETRICS, 2001, 57 (01) : 62 - 73
  • [10] Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case-Control Studies
    Wu, Yujie
    Gail, Mitchell
    Smith-Warner, Stephanie
    Ziegler, Regina
    Wang, Molin
    CANCERS, 2022, 14 (11)