Combining Multiple Observational Data Sources to Estimate Causal Effects

被引:39
|
作者
Yang, Shu [1 ]
Ding, Peng [2 ]
机构
[1] North Carolina State Univ, Dept Stat, 2311 Stinson Dr Campus Box 8203, Raleigh, NC 27695 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
Calibration; Causal inference; Inverse probability weighting; Missing confounder; Two-phase sampling; PROPENSITY SCORE CALIBRATION; DOUBLY ROBUST ESTIMATION; LARGE-SAMPLE PROPERTIES; AUXILIARY INFORMATION; MISSING CONFOUNDERS; MATCHING ESTIMATORS; VALIDATION DATA; REGRESSION; INFERENCE; 2-PHASE;
D O I
10.1080/01621459.2019.1609973
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data withon these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators.for this article are available online.
引用
收藏
页码:1540 / 1554
页数:15
相关论文
共 50 条
  • [21] Bayesian Federated Estimation of Causal Effects from Observational Data
    Thanh Vinh Vo
    Lee, Young
    Trong Nghia Hoang
    Leong, Tze-Yun
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2024 - 2034
  • [22] A semiparametric linear transformation model to estimate causal effects for survival data
    Lin, Huazhen
    Li, Yi
    Jiang, Liang
    Li, Gang
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2014, 42 (01): : 18 - 35
  • [23] Estimating Causal Effects in Linear Regression Models With Observational Data: The Instrumental Variables Regression Model
    Maydeu-Olivares, Alberto
    Shi, Dexin
    Fairchild, Amanda J.
    PSYCHOLOGICAL METHODS, 2020, 25 (02) : 243 - 258
  • [24] EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN
    Shook-Sa, Bonnie E.
    Hudgens, Michael G.
    Knittel, Andrea K.
    Edmonds, Andrew
    Ramirez, Catalina
    Cole, Stephen R.
    Cohen, Mardge
    Adedimeji, Adebola
    Taylor, Tonya
    Michel, Katherine G.
    Kovacs, Andrea
    Cohen, Jennifer
    Donohue, Jessica
    Foster, Antonina
    Fischl, Margaret A.
    Long, Dustin
    Adimora, Adaora A.
    ANNALS OF APPLIED STATISTICS, 2024, 18 (03): : 2147 - 2165
  • [25] Estimating Causal Effects of Interventions on Early-life Environmental Exposures Using Observational Data
    Smith, Tyler J. S.
    Keil, Alexander P.
    Buckley, Jessie P.
    CURRENT ENVIRONMENTAL HEALTH REPORTS, 2023, 10 (01) : 12 - 21
  • [26] Identifying and estimating causal effects of bridge failures from observational data
    Çiftçioğlu A.Ö.
    Naser M.Z.
    Journal of Infrastructure Intelligence and Resilience, 2024, 3 (01):
  • [27] Estimating Causal Mediation Effects in Multiple-Mediator Analyses With Clustered Data
    Liu, Xiao
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2025,
  • [28] A New Estimation Approach for Combining Epidemiological Data From Multiple Sources
    Huang, Hui
    Ma, Xiamei
    Waagepetersen, Rasmus
    Holford, Theodore R.
    Wang, Rong
    Risch, Harvey
    Mueller, Lloyd
    Guan, Yongtao
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) : 11 - 23
  • [29] COMBINING INFORMATION FROM MULTIPLE DATA SOURCES TO ASSESS POPULATION HEALTH
    Raghunathan, Trivellore
    Ghosh, Kaushik
    Rosen, Allison
    Imbriano, Paul
    Stewart, Susan
    Bondarenko, Irina
    Messer, Kassandra
    Berglund, Patricia
    Shaffer, James
    Cutler, David
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2021, 9 (03) : 598 - 625
  • [30] Causal Discovery with Heterogeneous Observational Data
    Zhou, Fangting
    He, Kejun
    Ni, Yang
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2383 - 2393