Selective machine learning of doubly robust functionals

被引：3

作者：

Cui, Y. ^{[1
,2
]}

Tchetgen, E. J. Tchetgen

机构：

[1] Zhejiang Univ, Sch Management, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China

[2] Zhejiang Univ, Ctr Data Sci, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China

来源：

BIOMETRIKA | 2024年 / 111卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Average treatment effect; Doubly robust functional; Influence function; Machine learning; Model selection; REGULARIZED CALIBRATED ESTIMATION; MISSING DATA; INFERENCE; REGRESSION; ESTIMATOR; NONRESPONSE; EFFICIENCY; MODELS;

D O I：

10.1093/biomet/asad055

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce a new selection criterion aimed at bias reduction in estimating the functional of interest based on a novel definition of pseudo risk inspired by the double robustness property. Intuitively, the proposed criterion selects a pair of learners with the smallest pseudo risk, so that the estimated functional is least sensitive to perturbations of a nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new selection criterion that states that our empirical criterion performs nearly as well as an oracle with a priori knowledge of the pseudo risk for each pair of candidate learners. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study that we illustrate in simulations and a data application.

引用

页码：517 / 535

页数：19

共 53 条

[1] [Anonymous], 2024, R: A Language and Environment for Statistical Computing
[2] Austern Morgane, 2020, arXiv
[3] Doubly robust estimation in missing data and causal inference models
Bang, H
[J]. BIOMETRICS, 2005, 61 (04) : 962 - 972
[4] A Nonparametric Super-Efficient Estimator of the Average Treatment Effect
Benkeser, David
Cai, Weixin
van der Laan, Mark J.
[J]. STATISTICAL SCIENCE, 2020, 35 (03) : 484 - 495
[5] Bickel P., 1993, Efficient and adaptive estimation for semiparametric models, Vvol. 4
[6] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[7] Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
Cao, Weihua
Tsiatis, Anastasios A.
Davidian, Marie
[J]. BIOMETRIKA, 2009, 96 (03) : 723 - 734
[8] A simple multiply robust estimator for missing response problem
Chan, Kwun Chuen Gary
[J]. STAT, 2013, 2 (01): : 143 - 149
[9] Oracle, Multiple Robust and Multipurpose Calibration in a Missing Response Problem
Chan, Kwun Chuen Gary
Yam, Sheung Chi Phillip
[J]. STATISTICAL SCIENCE, 2014, 29 (03) : 380 - 396
[10] Multiply robust imputation procedures for the treatment of item nonresponse in surveys
Chen, Sixia
Haziza, David
[J]. BIOMETRIKA, 2017, 104 (02) : 439 - 453

← 1 2 3 4 5 6 →