A Hybrid Covariate Microaggregation Approach for Privacy-Preserving Logistic Regression

被引:2
作者
Juwara, Lamin [1 ]
Saha-Chaudhuri, Paramita [2 ]
机构
[1] McGill Univ, Quantitat Life Sci Program, Montreal, PQ H3A 1G1, Canada
[2] Univ Vermont, Dept Math & Stat, Burlington, VT 05405 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Data privacy; Distributed data network; Logistic regression; Specimen pooling; POOLED EXPOSURE ASSESSMENT; CARE;
D O I
10.1093/jssam/smac013
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Distributed data networks (DDNs) with horizontally partitioned datasets are viable resources for multicenter research studies and pharmacosurveillance. Within DDNs, maintaining confidentiality and limiting the disclosure of sensitive information is critical. Consequently, data sharing between partners within the same network is either restricted or completely prohibited during statistical modeling. Current privacy-preserving methods for logistic regression span two extreme paradigms: meta-analysis (MA), which combines estimates based on partner-specific estimates, is convenient for the analytical center (AC) but requires separate implementations of the analysis by each data node; while distributed regression (DR), which provides overall estimates based on partner-specific data summaries, produces rigorous solutions but is an iterative process that is both time and resource consuming. A practical middle ground that combines the convenience of MA and the rigor of DR is lacking. We propose a likelihood-based approach for logistic regression modeling that combines the rigor of DR and the convenience of MA. The two-stage approach has an equivalent estimation performance as DR but foregoes its multiple iterative steps through an MA update step, and is therefore more user-friendly. The approach uses only aggregate-level covariates to estimate a starting pooled effect estimate and within-node data summaries for a single-shot update of the pooled estimate without requiring individual covariate values at the AC. We call the approach hybrid Pooled Logistic Regression (hPoLoR) and show that it conveniently provides accurate and efficient estimates of the standard individual-level log odds ratios and standard errors without revealing personal data. Hence hPoLoR provides a rigorous yet convenient and application-friendly alternative to MA and DR. The method is demonstrated through extensive simulations and application to the JCUSH data.
引用
收藏
页码:568 / 595
页数:28
相关论文
共 34 条
  • [21] Pooled Exposure Assessment for Matched Case-control Studies
    Saha-Chaudhuri, Paramita
    Umbach, David M.
    Weinberg, Clarice R.
    [J]. EPIDEMIOLOGY, 2011, 22 (05) : 704 - 712
  • [22] Sanmartin C., 2004, TECHNICAL REPORT STA
  • [23] The effect of microaggregation by individual ranking on the estimation of moments
    Schmid, Matthias
    Schneeweiss, Hans
    [J]. JOURNAL OF ECONOMETRICS, 2009, 153 (02) : 174 - 182
  • [24] Real-World Evidence - What Is It and What Can It Tell Us?
    Sherman, Rachel E.
    Anderson, Steven A.
    Dal Pan, Gerald J.
    Gray, Gerry W.
    Gross, Thomas
    Hunter, Nina L.
    LaVange, Lisa
    Marinac-Dabic, Danica
    Marks, Peter W.
    Robb, Melissa A.
    Shuren, Jeffrey
    Temple, Robert
    Woodcock, Janet
    Yue, Lilly Q.
    Califf, Robert M.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2016, 375 (23) : 2293 - 2297
  • [25] Suissa Samy, 2012, Open Med, V6, pe134
  • [26] Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study
    Toh, Sengwee
    Rifas-Shiman, Sheryl L.
    Lin, Pi-I D.
    Bailey, L. Charles
    Forrest, Christopher B.
    Horgan, Casie E.
    Lunsford, Douglas
    Moyneur, Erick
    Sturtevant, Jessica L.
    Young, Jessica G.
    Block, Jason P.
    Appelhans, Brad
    Arterburn, David
    Boone-Heinenon, Janne
    Brickman, Andrew L.
    Bunnell, H. Timothy
    Cole, F. Sessions, III
    Daley, Matthew F.
    Dempsey, Amanda
    Finkelstein, Jonathan
    Fitzpatrick, Stephanie L.
    Heerman, William
    Horberg, Michael
    Isasi, Carmen R.
    Jay, Melanie
    Kharbanda, Elyse
    Khare, Ritu
    Lemas, Dominick
    Lin, Simon M.
    Messito, Mary Jo
    O'Neill, Allison
    Peay, Holly Landrum
    Prochaska, Micah
    Ranade, Daksha
    Rao, Goutham
    Rayas, Maria
    Reynolds, Juliane S.
    Rosenman, Marc
    Taylor, Bradley
    Willis, Zachary
    [J]. PEDIATRIC RESEARCH, 2020, 87 (06) : 1086 - 1092
  • [27] Truex S., 2019, PROC 12 ACM WORKSHOP, P1, DOI DOI 10.1145/3338501.3357370
  • [28] Comparative effectiveness research: Policy context, methods development and research infrastructure
    Tunis, Sean R.
    Benner, Joshua
    McClellan, Mark
    [J]. STATISTICS IN MEDICINE, 2010, 29 (19) : 1963 - 1976
  • [29] van der Vaart AW., 1998, Adverbial constructions in the languages of Europe, DOI [10.1017/CBO9780511802256, DOI 10.1017/CBO9780511802256]
  • [30] A privacy-preserving and non-interactive federated learning scheme for regression training with gradient descent
    Wang, Fengwei
    Zhu, Hui
    Lu, Rongxing
    Zheng, Yandong
    Li, Hui
    [J]. INFORMATION SCIENCES, 2021, 552 : 183 - 200