Differentially Private Multi-Site Treatment Effect Estimation

被引:0
作者
Koga, Tatsuki [1 ]
Chaudhuri, Kamalika [1 ]
Page, David [2 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Duke Univ, Dept Biostat, Durham, NC USA
来源
IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024 | 2024年
关键词
Differential Privacy; Average Treatment Effect; Federated Analysis; TRIALS; NOISE;
D O I
10.1109/SaTML59370.2024.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Patient privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, literature in this space typically focuses on differentially private statistical estimation and machine learning, which is different from the causal inference-related problems that arise in healthcare. In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site. The main challenge comes from site heterogeneity-different sites have different sample sizes and privacy budgets. We address this through a class of persite estimation algorithms that reports the ATE estimate and its variance as a quality measure, and an aggregation algorithm on the server side that minimizes the overall variance of the final ATE estimate. Our experiments on real and synthetic data show that our method reliably aggregates private statistics across sites and provides better privacy-utility tradeoff under site heterogeneity than baselines.
引用
收藏
页码:472 / 489
页数:18
相关论文
共 42 条
[1]  
Acs Gergely, 2011, Information Hiding. 13th International Conference, IH 2011. Revised Selected Papers, P118, DOI 10.1007/978-3-642-24178-9_9
[2]  
[Anonymous], 2011, Privacy-Preserving Aggregation of Time-Series Data
[3]   The Privacy Blanket of the Shuffle Model [J].
Balle, Borja ;
Bell, James ;
Gascon, Adria ;
Nissim, Kobbi .
ADVANCES IN CRYPTOLOGY - CRYPTO 2019, PT II, 2019, 11693 :638-667
[4]   PROCHLO: Strong Privacy for Analytics in the Crowd [J].
Bittau, Andrea ;
Erlingsson, Ulfar ;
Maniatis, Petros ;
Mironov, Ilya ;
Raghunathan, Ananth ;
Lie, David ;
Rudominer, Mitch ;
Kode, Ushasree ;
Tinnes, Julien ;
Seefeld, Bernhard .
PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, :441-459
[5]  
Chen L., 2021, On Distributed Differential Privacy and Counting Distinct Elements
[6]   Distributed Differential Privacy via Shuffling [J].
Cheu, Albert ;
Smith, Adam ;
Ullman, Jonathan ;
Zeber, David ;
Zhilyaev, Maxim .
ADVANCES IN CRYPTOLOGY - EUROCRYPT 2019, PT I, 2019, 11476 :375-403
[7]  
Crammer K., 2005, ADV NEURAL INFORM PR, V18
[8]   Design Considerations in Multisite Randomized Trials Probing Moderated Treatment Effects [J].
Dong, Nianbo ;
Kelcey, Benjamin ;
Spybrook, Jessaca .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2021, 46 (05) :527-559
[9]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4004, P486
[10]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284