Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

被引:65
作者
Wendling, T. [1 ]
Jung, K. [2 ]
Callahan, A. [2 ]
Schuler, A. [2 ]
Shah, N. H. [2 ]
Gallego, B. [1 ]
机构
[1] Macquarie Univ, Australian Inst Hlth Innovat, Ctr Hlth Informat, Sydney, NSW, Australia
[2] Stanford Univ, Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
基金
英国医学研究理事会;
关键词
health care databases; heterogeneous treatment effects; machine learning; propensity score; simulation; 30-DAY READMISSIONS; GREEN BUTTON; CAUSAL; MODELS;
D O I
10.1002/sim.7820
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in real-world conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies.
引用
收藏
页码:3309 / 3324
页数:16
相关论文
共 41 条
[1]  
[Anonymous], FOREIGN LANGUAGE WOR
[2]  
[Anonymous], 2016, ARXIV160603976
[3]  
[Anonymous], 2017, ARXIV170702641
[4]  
Athey S., 2016, ARXIV161001271
[5]   Recursive partitioning for heterogeneous causal effects [J].
Athey, Susan ;
Imbens, Guido .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (27) :7353-7360
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   BART: BAYESIAN ADDITIVE REGRESSION TREES [J].
Chipman, Hugh A. ;
George, Edward I. ;
McCulloch, Robert E. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :266-298
[8]   A New Initiative on Precision Medicine [J].
Collins, Francis S. ;
Varmus, Harold .
NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (09) :793-795
[9]   Can the Learning Health Care System Be Educated With Observational Data? [J].
Dahabreh, Issa J. ;
Kent, David M. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2014, 312 (02) :129-130
[10]   An Observational Study Goes Where Randomized Clinical Trials Have Not [J].
Frakt, Austin B. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2015, 313 (11) :1091-1092