Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations

被引:114
作者
Kern, Holger L. [1 ]
Stuart, Elizabeth A. [2 ]
Hill, Jennifer [3 ]
Green, Donald P. [4 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Johns Hopkins Univ, Baltimore, MD 21205 USA
[3] NYU, New York, NY USA
[4] Columbia Univ, New York, NY USA
基金
美国国家科学基金会;
关键词
Bayesian Additive; Regression Trees; external validity; generalizability; propensity score; weighting; PROPENSITY SCORE; EXTERNAL VALIDITY; CAUSAL INFERENCE; SELECTION; BIAS;
D O I
10.1080/19345747.2015.1060282
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments because the experimental participants may be unrepresentative of the target population of interest. This article examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multisite experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.
引用
收藏
页码:103 / 127
页数:25
相关论文
共 44 条
[1]   Are experiments the only option? A look at dropout prevention programs [J].
Agodini, R ;
Dynarski, M .
REVIEW OF ECONOMICS AND STATISTICS, 2004, 86 (01) :180-194
[2]   Do people behave in experiments as in the field? evidence from donations [J].
Benz, Matthias ;
Meier, Stephan .
EXPERIMENTAL ECONOMICS, 2008, 11 (03) :268-281
[3]   EXTERNAL VALIDITY IS MORE THAN SKIN DEEP - SOME ANSWERS TO CRITICISMS OF LABORATORY EXPERIMENTS [J].
BERKOWITZ, L ;
DONNERSTEIN, E .
AMERICAN PSYCHOLOGIST, 1982, 37 (03) :245-257
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   EARLY INTERVENTION IN LOW-BIRTH-WEIGHT PREMATURE-INFANTS - RESULTS THROUGH AGE 5 YEARS FROM THE INFANT HEALTH AND DEVELOPMENT PROGRAM [J].
BROOKSGUNN, J ;
MCCARTON, CM ;
CASEY, PH ;
MCCORMICK, MC ;
BAUER, CR ;
BERNBAUM, JC ;
TYSON, J ;
SWANSON, M ;
BENNETT, FC ;
SCOTT, DT ;
TONASCIA, J ;
MEINERT, CL .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1994, 272 (16) :1257-1262
[6]  
Chipman H.A., 2007, Advances in Neural Information Processing Systems, P265, DOI DOI 10.7551/MITPRESS/7503.003.0038
[7]   BART: BAYESIAN ADDITIVE REGRESSION TREES [J].
Chipman, Hugh A. ;
George, Edward I. ;
McCulloch, Robert E. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :266-298
[8]   Generalizing Evidence From Randomized Clinical Trials to Target Populations [J].
Cole, Stephen R. ;
Stuart, Elizabeth A. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (01) :107-115
[9]   Should students be used as subjects in experimental auctions? [J].
Depositario, Dinah Pura T. ;
Nayga, Rodolfo M., Jr. ;
Wu, Ximing ;
Laude, Tiffany R. .
ECONOMICS LETTERS, 2009, 102 (02) :122-124
[10]   Heterogeneous impacts in PROGRESA [J].
Djebbari, Habiba ;
Smith, Jeffrey .
JOURNAL OF ECONOMETRICS, 2008, 145 (1-2) :64-80