A Hybrid Covariate Microaggregation Approach for Privacy-Preserving Logistic Regression

被引:2
作者
Juwara, Lamin [1 ]
Saha-Chaudhuri, Paramita [2 ]
机构
[1] McGill Univ, Quantitat Life Sci Program, Montreal, PQ H3A 1G1, Canada
[2] Univ Vermont, Dept Math & Stat, Burlington, VT 05405 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Data privacy; Distributed data network; Logistic regression; Specimen pooling; POOLED EXPOSURE ASSESSMENT; CARE;
D O I
10.1093/jssam/smac013
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Distributed data networks (DDNs) with horizontally partitioned datasets are viable resources for multicenter research studies and pharmacosurveillance. Within DDNs, maintaining confidentiality and limiting the disclosure of sensitive information is critical. Consequently, data sharing between partners within the same network is either restricted or completely prohibited during statistical modeling. Current privacy-preserving methods for logistic regression span two extreme paradigms: meta-analysis (MA), which combines estimates based on partner-specific estimates, is convenient for the analytical center (AC) but requires separate implementations of the analysis by each data node; while distributed regression (DR), which provides overall estimates based on partner-specific data summaries, produces rigorous solutions but is an iterative process that is both time and resource consuming. A practical middle ground that combines the convenience of MA and the rigor of DR is lacking. We propose a likelihood-based approach for logistic regression modeling that combines the rigor of DR and the convenience of MA. The two-stage approach has an equivalent estimation performance as DR but foregoes its multiple iterative steps through an MA update step, and is therefore more user-friendly. The approach uses only aggregate-level covariates to estimate a starting pooled effect estimate and within-node data summaries for a single-shot update of the pooled estimate without requiring individual covariate values at the AC. We call the approach hybrid Pooled Logistic Regression (hPoLoR) and show that it conveniently provides accurate and efficient estimates of the standard individual-level log odds ratios and standard errors without revealing personal data. Hence hPoLoR provides a rigorous yet convenient and application-friendly alternative to MA and DR. The method is demonstrated through extensive simulations and application to the JCUSH data.
引用
收藏
页码:568 / 595
页数:28
相关论文
共 34 条
  • [1] Agresti A., 2012, Categorical data analysis
  • [2] [Anonymous], 2003, CAN MED ASSOC J, V169, P5
  • [3] Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks
    Brown, Jeffrey S.
    Kahn, Michael
    Toh, Sengwee
    [J]. MEDICAL CARE, 2013, 51 (08) : S22 - S29
  • [4] Cook A., 2012, MINI SENTINEL METHOD
  • [5] A secure distributed logistic regression protocol for the detection of rare adverse drug events
    El Emam, Khaled
    Samet, Saeed
    Arbuckle, Luk
    Tamblyn, Robyn
    Earle, Craig
    Kantarcioglu, Murat
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (03) : 453 - 461
  • [6] Ferguson T. S., 2014, Mathematical statistics: A decision theoretic approach, V1
  • [7] Fienberg SE, 2006, LECT NOTES COMPUT SC, V4302, P277
  • [8] Fienberg SE, 2009, LECT NOTES COMPUT SC, V5661, P82, DOI 10.1007/978-3-642-10233-2_8
  • [9] Proton pump inhibitors and the risk of hospitalisation for community-acquired pneumonia: replicated cohort studies with meta-analysis
    Filion, Kristian B.
    Chateau, Dan
    Targownik, Laura E.
    Gershon, Andrea
    Durand, Madeleine
    Tamim, Hala
    Teare, Gary F.
    Ravani, Pietro
    Ernst, Pierre
    Dormuth, Colin R.
    [J]. GUT, 2014, 63 (04) : 552 - 558
  • [10] The Health Insurance Portability and Accountability Act Privacy Rule - A practical guide for researchers
    Gunn, PP
    Fremont, AM
    Bottrell, M
    Shugarman, LR
    Galegher, J
    Bikson, T
    [J]. MEDICAL CARE, 2004, 42 (04) : 321 - 327