Selective machine learning of doubly robust functionals

被引:3
作者
Cui, Y. [1 ,2 ]
Tchetgen, E. J. Tchetgen
机构
[1] Zhejiang Univ, Sch Management, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[2] Zhejiang Univ, Ctr Data Sci, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Average treatment effect; Doubly robust functional; Influence function; Machine learning; Model selection; REGULARIZED CALIBRATED ESTIMATION; MISSING DATA; INFERENCE; REGRESSION; ESTIMATOR; NONRESPONSE; EFFICIENCY; MODELS;
D O I
10.1093/biomet/asad055
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion that states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study that we illustrate in simulations and a data application.
引用
收藏
页码:517 / 535
页数:19
相关论文
共 53 条
  • [1] [Anonymous], 2024, R: A Language and Environment for Statistical Computing
  • [2] Austern Morgane, 2020, arXiv
  • [3] Doubly robust estimation in missing data and causal inference models
    Bang, H
    [J]. BIOMETRICS, 2005, 61 (04) : 962 - 972
  • [4] A Nonparametric Super-Efficient Estimator of the Average Treatment Effect
    Benkeser, David
    Cai, Weixin
    van der Laan, Mark J.
    [J]. STATISTICAL SCIENCE, 2020, 35 (03) : 484 - 495
  • [5] Bickel P., 1993, Efficient and adaptive estimation for semiparametric models, Vvol. 4
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
    Cao, Weihua
    Tsiatis, Anastasios A.
    Davidian, Marie
    [J]. BIOMETRIKA, 2009, 96 (03) : 723 - 734
  • [8] A simple multiply robust estimator for missing response problem
    Chan, Kwun Chuen Gary
    [J]. STAT, 2013, 2 (01): : 143 - 149
  • [9] Oracle, Multiple Robust and Multipurpose Calibration in a Missing Response Problem
    Chan, Kwun Chuen Gary
    Yam, Sheung Chi Phillip
    [J]. STATISTICAL SCIENCE, 2014, 29 (03) : 380 - 396
  • [10] Multiply robust imputation procedures for the treatment of item nonresponse in surveys
    Chen, Sixia
    Haziza, David
    [J]. BIOMETRIKA, 2017, 104 (02) : 439 - 453