Calibrating predictive model estimates in a distributed network of patient data

被引:4
作者
Huang, Yingxiang [1 ]
Jiang, Xiaoqian [2 ]
Gabriel, Rodney A. [1 ,3 ]
Ohno-Machado, Lucila [1 ,4 ]
机构
[1] Univ Calif San Diego, UC San Diego Hlth Dept Biomed Informat, La Jolla, CA 92093 USA
[2] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Houston, TX 77030 USA
[3] Univ Calif San Diego, Dept Anesthesiol, La Jolla, CA USA
[4] VA San Diego Healthcare Syst, Div Hlth Serv Res & Dev, San Diego, CA 92161 USA
基金
美国国家卫生研究院;
关键词
Calibration; Binary classifier; Model evaluation; Isotonic regression; Federated learning; Data privacy; REGRESSION; PRIVACY; CARE;
D O I
10.1016/j.jbi.2021.103758
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Protecting the privacy of patient data is an important issue. Patient data are typically protected in local health systems, but this makes integration of data from different healthcare systems difficult. To build high-performance predictive models, a large number of samples are needed, and performance measures such as calibration and discrimination are essential. While distributed algorithms for building models and measuring discrimination have been published, distributed algorithms to measure calibration and recalibrate models have not been proposed. Objective: Recalibration models have been shown to improve calibration, but they have not been proposed for data that are distributed in various health systems, or "sites". Our goal is to measure calibration performance and build a global recalibration model using data from multiple health systems, without sharing patient-level data. Materials and Methods: We developed a distributed smooth isotonic regression recalibration model and extended established calibration measures, such as Hosmer-Lemeshow Tests, Expected Calibration Error, and Maximum Calibration Error in a distributed manner. Results: Experiments on both simulated and clinical data were conducted, and the recalibration results produced by a traditional (ie, centralized) versus a distributed smooth isotonic regression were compared. The results were exactly the same. Discussion: Our algorithms demonstrated that calibration can be improved and measured in a distributed manner while protecting data privacy, albeit at some cost in terms of computational efficiency. It also gives researchers who may have too few instances in their own institutions a method to construct robust recalibration models. Conclusion: Preserving data privacy and improving model calibration are both important to advancing predictive analysis in clinical informatics. The algorithms alleviate the difficulties in model building across sites.
引用
收藏
页数:11
相关论文
共 37 条
[1]   ACTIVE SET ALGORITHMS FOR ISOTONIC REGRESSION - A UNIFYING FRAMEWORK [J].
BEST, MJ ;
CHAKRAVARTI, N .
MATHEMATICAL PROGRAMMING, 1990, 47 (03) :425-439
[2]   Increasing the reliability of reliability diagrams [J].
Brocker, Jochen ;
Smith, Leonard A. .
WEATHER AND FORECASTING, 2007, 22 (03) :651-661
[3]  
Chan THH, 2018, SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P2201
[4]   Unique in the shopping mall: On the reidentifiability of credit card metadata [J].
de Montjoye, Yves-Alexandre ;
Radaelli, Laura ;
Singh, Vivek Kumar ;
Pentland, Alex Sandy .
SCIENCE, 2015, 347 (6221) :536-539
[5]   Unique in the Crowd: The privacy bounds of human mobility [J].
de Montjoye, Yves-Alexandre ;
Hidalgo, Cesar A. ;
Verleysen, Michel ;
Blondel, Vincent D. .
SCIENTIFIC REPORTS, 2013, 3
[6]   Anonymizing NYC Taxi Data: Does It Matter? [J].
Douriez, Marie ;
Doraiswamy, Harish ;
Freire, Juliana ;
Silva, Claudio T. .
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, :140-148
[7]   A secure distributed logistic regression protocol for the detection of rare adverse drug events [J].
El Emam, Khaled ;
Samet, Saeed ;
Arbuckle, Luk ;
Tamblyn, Robyn ;
Earle, Craig ;
Kantarcioglu, Murat .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (03) :453-461
[8]  
Goodrich MT, 2010, PROC APPL MATH, V135, P1262
[9]  
Hamada K., 2014, 2014121 IACR, V2014
[10]  
Hern A, 2017, GUARDIAN, V1