Privacy-Preserving Federated Learning Framework for Multi-Source Electronic Health Records Prognosis Prediction

被引:0
作者
Zhao, Huiya [2 ,3 ]
Sui, Dehao [2 ,3 ]
Wang, Yasha [2 ,3 ]
Ma, Liantao [2 ,3 ]
Wang, Ling [1 ]
机构
[1] Xuzhou Med Univ, Affiliated Xuzhou Municipal Hosp, Xuzhou 221002, Peoples R China
[2] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing 100871, Peoples R China
[3] Minist Educ, Key Lab High Confidence Software Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
federated learning; healthcare privacy; multi-institutional collaboration;
D O I
10.3390/s25082374
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Secure and privacy-preserving health status representation learning has become a critical challenge in clinical prediction systems. While deep learning models require substantial high-quality data for training, electronic health records are often restricted by strict privacy regulations and institutional policies, particularly during emerging health crises. Traditional approaches to data integration across medical institutions face significant privacy and security challenges, as healthcare providers cannot directly share patient data. This work presents MultiProg, a secure federated learning framework for clinical representation learning. Our approach enables multiple medical institutions to collaborate without exchanging raw patient data, maintaining data locality while improving model performance. The framework employs a multi-channel architecture where institutions share only the low-level feature extraction layers, protecting sensitive patient information. We introduce a feature calibration mechanism that ensures robust performance even with heterogeneous feature sets across different institutions. Through extensive experiments, we demonstrate that the framework successfully enables secure knowledge sharing across institutions without compromising sensitive patient data, achieving enhanced predictive capabilities compared to isolated institutional models. Compared to state-of-the-art methods, our approach achieves the best performance across multiple datasets with statistically significant improvements.
引用
收藏
页数:15
相关论文
共 48 条
[1]  
[Anonymous], 2018, 3 INT WORKSH KNOWL D
[2]  
[Anonymous], Pandemic Influenza Plan: 2017 Update
[3]   Patient Subtyping via Time-Aware LSTM Networks [J].
Baytas, Inci M. ;
Xiao, Cao ;
Zhang, Xi ;
Wang, Fei ;
Jain, Anil K. ;
Zhou, Jiayu .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :65-74
[4]  
Choi E, 2018, ADV NEUR IN, V31
[5]  
Choi E, 2016, ADV NEUR IN, V29
[6]  
Chung JY, 2014, Arxiv, DOI arXiv:1412.3555
[7]  
Davis J., 2006, P 23 INT C MACHINE L, P233, DOI [DOI 10.1145/1143844.1143874, 10.1145/1143844.1143874]
[8]   Predicting Clinical Events by Combining Static and Dynamic Information using Recurrent Neural Networks [J].
Esteban, Cristobal ;
Staeck, Oliver ;
Baier, Stephan ;
Yang, Yinchong ;
Tresp, Volker .
2016 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2016, :93-101
[9]  
Falcon W., 2019, PyTorch Lightning
[10]   A comprehensive benchmark for COVID-19 predictive modeling using electronic health records in intensive care [J].
Gao, Junyi ;
Zhu, Yinghao ;
Wang, Wenqing ;
Wang, Zixiang ;
Dong, Guiying ;
Tang, Wen ;
Wang, Hao ;
Wang, Yasha ;
Harrison, Ewen M. ;
Ma, Liantao .
PATTERNS, 2024, 5 (04)