Tuning Random Forests for Causal Inference under Cluster-Level Unmeasured Confounding

被引:6
作者
Suk, Youmi [1 ]
Kang, Hyunseung [2 ]
机构
[1] Univ Virginia, Sch Data Sci, 31 Bonnycastle Dr, Charlottesville, VA 22903 USA
[2] Univ Wisconsin Madison, Dept Stat, Madison, WI USA
关键词
Causal inference; machine learning methods; unmeasured variables; omitted variable bias; fixed effects models; PROPENSITY SCORE; HETEROGENEITY; EXPOSURE;
D O I
10.1080/00273171.2021.1994364
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Recently, there has been growing interest in using machine learning methods for causal inference due to their automatic and flexible ability to model the propensity score and the outcome model. However, almost all the machine learning methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on a machine learning method based on random forests known as Causal Forests and presents five simple modifications for tuning Causal Forests so that they are robust to cluster-level unmeasured confounding. Our simulation study finds that adjusting the default tuning procedure with the propensity score from fixed effects logistic regression or using variables that are centered to their cluster means produces estimates that are more robust to cluster-level unmeasured confounding. Also, when these parametric propensity score models are mis-specified, our modified machine learning methods remain robust to bias from cluster-level unmeasured confounders compared to existing parametric approaches based on propensity score weighting. We conclude by demonstrating our proposals in a real data study concerning the effect of taking an eighth-grade algebra course on math achievement scores from the Early Childhood Longitudinal Study.
引用
收藏
页码:408 / 440
页数:33
相关论文
共 62 条
[1]  
Anderson R., 2011, Journal of Research in Rural Education, V26, P1
[2]  
Arkhangelsky D., 2019, The role of the propensity score in fixed effect models. arxiv
[3]   Propensity score matching with clustered data. An application to the estimation of the impact of caesarean section on the Apgar score [J].
Arpino, Bruno ;
Cannas, Massimo .
STATISTICS IN MEDICINE, 2016, 35 (12) :2074-2091
[4]   The specification of the propensity score in multilevel observational studies [J].
Arpino, Bruno ;
Mealli, Fabrizia .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (04) :1770-1780
[5]   GENERALIZED RANDOM FORESTS [J].
Athey, Susan ;
Tibshirani, Julie ;
Wager, Stefan .
ANNALS OF STATISTICS, 2019, 47 (02) :1148-1178
[6]   Recursive partitioning for heterogeneous causal effects [J].
Athey, Susan ;
Imbens, Guido .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (27) :7353-7360
[7]   Fitting Linear Mixed-Effects Models Using lme4 [J].
Bates, Douglas ;
Maechler, Martin ;
Bolker, Benjamin M. ;
Walker, Steven C. .
JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01) :1-48
[8]   Some practical guidance for the implementation of propensity score matching [J].
Caliendo, Marco ;
Kopeinig, Sabine .
JOURNAL OF ECONOMIC SURVEYS, 2008, 22 (01) :31-72
[9]   Who takes what math and in which track? Using TIMSS to characterize US students' eighth-grade mathematics learning opportunities [J].
Cogan, LS ;
Schmidt, WH ;
Wiley, DE .
EDUCATIONAL EVALUATION AND POLICY ANALYSIS, 2001, 23 (04) :323-341
[10]   Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition [J].
Dorie, Vincent ;
Hill, Jennifer ;
Shalit, Uri ;
Scott, Marc ;
Cervone, Dan .
STATISTICAL SCIENCE, 2019, 34 (01) :43-68