Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data

被引:30
作者
Ju, Cheng [1 ]
Wyss, Richard [2 ,3 ]
Franklin, Jessica M. [2 ,3 ]
Schneeweiss, Sebastian [2 ,3 ]
Haggstrom, Jenny [4 ]
van der Laan, Mark J. [1 ]
机构
[1] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
[2] Brigham & Womens Hosp, Dept Med, Div Pharmacoepidemiol & Pharmacoecon, Boston, MA 02115 USA
[3] Harvard Med Sch, Boston, MA 02115 USA
[4] Umea Univ, USBE, Dept Stat, Umea, Sweden
基金
瑞典研究理事会;
关键词
Propensity score; average treatment effect; LASSO; model selection; electronic healthcare database; collaborative targeted minimum loss-based estimation; CONFOUNDING ADJUSTMENT; REGRESSION; PERFORMANCE; ROBUSTNESS; ALGORITHM; INFERENCE; SELECTION; MODELS; SAFETY; DRUG;
D O I
10.1177/0962280217744588
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Propensity score-based estimators are increasingly used for causal inference in observational studies. However, model selection for propensity score estimation in high-dimensional data has received little attention. In these settings, propensity score models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a propensity score model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a propensity score model in order to minimize a bias-variance tradeoff in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for propensity score estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the propensity score model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the collaborative minimum loss-based estimation algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the propensity score model selected by collaborative minimum loss-based estimation could be applied to other propensity score-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.
引用
收藏
页码:1044 / 1063
页数:20
相关论文
共 50 条
[1]  
[Anonymous], ARXIV170302236
[2]  
[Anonymous], 1977, Foundations of Inference in Survey Sampling
[3]   Doubly robust estimation in missing data and causal inference models [J].
Bang, H .
BIOMETRICS, 2005, 61 (04) :962-972
[4]  
Bembom O, 2007, STAT APPL GENET MOL, V6
[5]   Online cross-validation-based ensemble learning [J].
Benkeser, David ;
Ju, Cheng ;
Lendle, Sam ;
van der Laan, Mark .
STATISTICS IN MEDICINE, 2018, 37 (02) :249-260
[6]   The Highly Adaptive Lasso Estimator [J].
Benkeser, David ;
van der Laan, Mark .
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, :689-696
[7]  
Brandon G., 2006, R PACKAGE VERSION, V1, P55
[8]   Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable [J].
Brookhart, MA ;
Wang, PS ;
Solomon, DH ;
Schneeweiss, S .
EPIDEMIOLOGY, 2006, 17 (03) :268-275
[9]   MISCLASSIFICATION IN 2 X 2 TABLES [J].
BROSS, I .
BIOMETRICS, 1954, 10 (04) :478-486
[10]  
CASSEL CM, 1976, BIOMETRIKA, V63, P615, DOI 10.2307/2335742