Testing a global null hypothesis using ensemble machine learning methods

被引:0
作者
Han, Sunwoo [1 ]
Fong, Youyi [1 ]
Huang, Ying [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, 1124 Columbia St, Seattle, WA 98104 USA
基金
美国国家卫生研究院;
关键词
hypothesis test; vaccine efficacy trial; cross validation; AUC; random forest; stacking; HIV-1 VACCINE EFFICACY; ADAPTIVE RESAMPLING TEST; MULTIPLE TESTS; REGRESSION; INFERENCE; CURVE; AREA;
D O I
10.1002/sim.9362
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Testing a global null hypothesis that there are no significant predictors for a binary outcome of interest among a large set of biomarker measurements is an important task in biomedical studies. We seek to improve the power of such testing methods by leveraging ensemble machine learning methods. Ensemble machine learning methods such as random forest, bagging, and adaptive boosting model the relationship between the outcome and the predictor nonparametrically, while stacking combines the strength of multiple learners. We demonstrate the power of the proposed testing methods through Monte Carlo studies and show the use of the methods by applying them to the immunologic biomarkers dataset from the RV144 HIV vaccine efficacy trial.
引用
收藏
页码:2417 / 2426
页数:10
相关论文
共 37 条
  • [1] Airola A, 2010, JMLR WORKSH CONF PRO, V8, P3
  • [2] Bellot A., 2021, THESIS U CAMBRIDGE C
  • [3] A machine learning-based approach for estimating and testing associations with multivariate outcomes
    Benkeser, David
    Mertens, Andrew
    Colford, John M.
    Hubbard, Alan
    Arnold, Benjamin F.
    Stein, Aryeh
    van der Laan, Mark J.
    [J]. INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2021, 17 (01) : 7 - 21
  • [4] Improved Small-Sample Estimation of Nonlinear Cross-Validated Prediction Metrics
    Benkeser, David
    Petersen, Maya
    van der Laan, Mark J.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (532) : 1917 - 1932
  • [5] Breiman L, 1996, MACH LEARN, V24, P49
  • [6] Breslow NE, 2009, STAT BIOSCI, V1, P32, DOI 10.1007/s12561-009-9001-6
  • [7] Using the Whole Cohort in the Analysis of Case-Cohort Data
    Breslow, Norman E.
    Lumley, Thomas
    Ballantyne, Christie M.
    Chambless, Lloyd E.
    Kulich, Michal
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2009, 169 (11) : 1398 - 1405
  • [8] Chen C., 2004, USING RANDOM FOREST, V110, P1
  • [9] Protection against HIV Acquisition in the RV144 Trial
    Desrosiers, Ronald C.
    [J]. JOURNAL OF VIROLOGY, 2017, 91 (18)
  • [10] TESTING FOR INCLUSION OF VARIABLES IN LINEAR REGRESSION BY A RANDOMISATION TECHNIQUE
    DRAPER, NR
    STONEMAN, DM
    [J]. TECHNOMETRICS, 1966, 8 (04) : 695 - &