Scaling survival analysis in healthcare with federated survival forests: A comparative study on heart failure and breast cancer genomics

被引:6
作者
Archetti, Alberto [1 ]
Ieva, Francesca [2 ,3 ]
Matteucci, Matteo [1 ]
机构
[1] Politecn Milan, Dept Elect Informat & Bioengn DEIB, Via Ponzio 34, I-20133 Milan, Italy
[2] Politecn Milan, Dept Math, Via Bonardi 9, I-20133 Milan, Italy
[3] Hlth Data Sci Ctr, Human Technopole Viale R Levi Montalcini 1, I-20157 Milan, Italy
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2023年 / 149卷
基金
欧盟地平线“2020”;
关键词
Survival analysis; Federated learning; Random survival forest; Heart failure; Breast cancer; CHALLENGES; PREDICTION; MODELS;
D O I
10.1016/j.future.2023.07.036
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Survival analysis is a fundamental tool in medicine, modeling the time until an event of interest occurs in a population. However, in real-world applications, survival data are often incomplete, censored, distributed, and confidential, especially in healthcare settings where privacy is critical. The scarcity of data can severely limit the scalability of survival models to distributed applications that rely on large data pools. Federated learning is a promising technique that enables machine learning models to be trained on multiple datasets without compromising user privacy, making it particularly well-suited for addressing the challenges of survival data and large-scale survival applications. Despite significant developments in federated learning for classification and regression, many directions remain unexplored in the context of survival analysis. In this work, we propose an extension of the Federated Survival Forest algorithm, called FedSurF++. This federated ensemble method constructs random survival forests in heterogeneous federations. Specifically, we investigate several new tree sampling methods from client forests and compare the results with state-of-the-art survival models based on neural networks. The key advantage of FedSurF++ is its ability to achieve comparable performance to existing methods while requiring only a single communication round to complete. The extensive empirical investigation results in a significant improvement from the algorithmic and privacy preservation perspectives, making the original FedSurF algorithm more efficient, robust, and private. We also present results on two real-world datasets - a heart failure dataset from the Lombardy HFData project and Fed-TCGA-BRCA from the Falmby suite - demonstrating the success of FedSurF++ in real-world healthcare studies. Our results underscore the potential of FedSurF++ to improve the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:343 / 358
页数:16
相关论文
共 74 条
[1]   NONPARAMETRIC INFERENCE FOR A FAMILY OF COUNTING PROCESSES [J].
AALEN, O .
ANNALS OF STATISTICS, 1978, 6 (04) :701-726
[2]  
Acar DAE, 2021, Arxiv, DOI [arXiv:2111.04263, 10.48550/ARXIV.2111.04263]
[3]  
Andreux Mathieu., 2020, arXiv, DOI DOI 10.48550/ARXIV.2006.08997
[4]  
Archetti Alberto, 2023, ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, P173, DOI 10.1145/3578245.3584935
[5]   Federated Survival Forests [J].
Archetti, Alberto ;
Matteucci, Matteo .
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[6]   dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD [J].
Banerjee, Soumya ;
Sofack, Ghislain N. ;
Papakonstantinou, Thodoris ;
Avraam, Demetris ;
Burton, Paul ;
Zoeller, Daniela ;
Bishop, Tom R. P. .
BMC RESEARCH NOTES, 2022, 15 (01)
[7]   A General Machine Learning Framework for Survival Analysis [J].
Bender, Andreas ;
Ruegamer, David ;
Scheipl, Fabian ;
Bischl, Bernd .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 :158-173
[8]   The logrank test [J].
Bland, JM ;
Altman, DG .
BRITISH MEDICAL JOURNAL, 2004, 328 (7447) :1073-1073
[9]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[10]  
Breiman L., 1984, CLASSIFICATION REGRE, DOI DOI 10.1201/9781315139470