Vertical federated learning based on data subset representation for healthcare application

被引:0
作者
Shi, Yukun [1 ]
Zhang, Jilin [1 ]
Xue, Meiting [1 ]
Zeng, Yan [2 ]
Jia, Gangyong [2 ]
Yu, Qihong [2 ]
Li, Miaoqi [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Cyberspace, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Vertical federated learning; Latent feature representation; Smart healthcare; Privacy preservation;
D O I
10.1016/j.cmpb.2025.108623
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and Objective : Artificial intelligence is increasingly essential for disease classification and clinical diagnosis tasks in healthcare. Given the strict privacy needs of healthcare data, Vertical Federated Learning (VFL) has been introduced. VFL allows multiple hospitals to collaboratively train models on vertically partitioned data, where each holds only the patient's partial data features, thus maintaining patient confidentiality. However, VFL applications in healthcare scenarios with fewer samples and labels are challenging because existing methods heavily depend on labeled samples and do not consider the intrinsic connections among the data across hospitals. Methods : This paper proposes FedRL, a representation-based VFL method that enhances the performance of downstream tasks by utilizing aligned data for federated representation pretraining. The proposed method creates the same feature dimensions subsets by splitting the local data, exploiting the relationships among these subsets, constructing a bespoke loss function, and collaboratively training a representation model to these subsets across all participating hospitals. This model captures the latent representations of the global data, which are then applied to the downstream classification tasks. Results and Conclusion : The proposed FedRL method was validated through experiments on three healthcare datasets. The results demonstrate that the proposed method outperforms several existing methods across three performance metrics. Specifically, FedRL achieves average improvements of 4.7%, 5.6%, and 4.8% in accuracy, AUC, and F1-score, respectively, compared to current methods. In addition, FedRL demonstrates greater robustness and consistent performance in scenarios with limited labeled samples, thereby confirming its effectiveness and potential use in healthcare data analysis.
引用
收藏
页数:11
相关论文
共 44 条
  • [31] Shaheen M.Y., 2021, Sci. Prepr.
  • [32] Fed-BioMed: A General Open-Source Frontend Framework for Federated Learning in Healthcare
    Silva, Santiago
    Altmann, Andre
    Gutman, Boris
    Lorenzi, Marco
    [J]. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER, AND DISTRIBUTED AND COLLABORATIVE LEARNING, DART 2020, DCL 2020, 2020, 12444 : 201 - 210
  • [33] Sohn K, 2016, ADV NEUR IN, V29
  • [34] Street W.N., 1993, Electronic Imaging
  • [35] Uçar T, 2021, ADV NEUR IN
  • [36] Vepakomma P, 2018, Arxiv, DOI [arXiv:1812.00564, 10.48550/arXiv.1812.00564]
  • [37] Vincent P., 2008, P 25 INT C MACH LEAR, P1096, DOI [DOI 10.1145/1390156.1390294, 10.1145/1390156.1390294]
  • [38] Artificial Intelligence in Radiotherapy Treatment Planning: Present and Future
    Wang, Chunhao
    Zhu, Xiaofeng
    Hong, Julian C.
    Zheng, Dandan
    [J]. TECHNOLOGY IN CANCER RESEARCH & TREATMENT, 2019, 18
  • [39] Practical Vertical Federated Learning With Unsupervised Representation Learning
    Wu, Zhaomin
    Li, Qinbin
    He, Bingsheng
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (06) : 864 - 878
  • [40] Federated Learning for Healthcare Informatics
    Xu, Jie
    Glicksberg, Benjamin S.
    Su, Chang
    Walker, Peter
    Bian, Jiang
    Wang, Fei
    [J]. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2021, 5 (01) : 1 - 19