Selective machine learning of doubly robust functionals

被引:3
作者
Cui, Y. [1 ,2 ]
Tchetgen, E. J. Tchetgen
机构
[1] Zhejiang Univ, Sch Management, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[2] Zhejiang Univ, Ctr Data Sci, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Average treatment effect; Doubly robust functional; Influence function; Machine learning; Model selection; REGULARIZED CALIBRATED ESTIMATION; MISSING DATA; INFERENCE; REGRESSION; ESTIMATOR; NONRESPONSE; EFFICIENCY; MODELS;
D O I
10.1093/biomet/asad055
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion that states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study that we illustrate in simulations and a data application.
引用
收藏
页码:517 / 535
页数:19
相关论文
共 53 条
  • [21] Hirano K., 2001, Health Services and Outcomes Research Methodology, V2, P259, DOI [10.1023/A:1020371312283, DOI 10.1023/A:1020371312283]
  • [22] Ibragimov I. A., 2013, Statistical Estimation: Asymptotic Theory, V16
  • [23] Ju C., 2018, ARXIV
  • [24] Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data
    Ju, Cheng
    Wyss, Richard
    Franklin, Jessica M.
    Schneeweiss, Sebastian
    Haggstrom, Jenny
    van der Laan, Mark J.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (04) : 1044 - 1063
  • [25] Scalable collaborative targeted learning for high-dimensional data
    Ju, Cheng
    Gruber, Susan
    Lendle, Samuel D.
    Chambaz, Antoine
    Franklin, Jessica M.
    Wyss, Richard
    Schneeweiss, Sebastian
    van der Laan, Mark J.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (02) : 532 - 554
  • [26] Demystifying a class of multiply robust estimators
    Li, Wei
    Gu, Yuwen
    Liu, Lan
    [J]. BIOMETRIKA, 2020, 107 (04) : 919 - 933
  • [27] SEMIPARAMETRIC EFFICIENCY BOUNDS
    NEWEY, WK
    [J]. JOURNAL OF APPLIED ECONOMETRICS, 1990, 5 (02) : 99 - 135
  • [28] Polley E., 2021, U NACL COLOMBIA COLO
  • [29] Robins J., 2008, Probability and statistics: essays in honor of David A. Freedman, P335, DOI DOI 10.1214/193940307000000527
  • [30] MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL
    Robins, James M.
    Li, Lingling
    Mukherjee, Rajarshi
    Tchetgen, Eric Tchetgen
    van der Vaart, Aad
    [J]. ANNALS OF STATISTICS, 2017, 45 (05) : 1951 - 1987