Federated singular value decomposition for high-dimensional data

被引:0
作者
Anne Hartebrodt
Richard Röttger
David B. Blumenthal
机构
[1] University of Southern Denmark,Department of Mathematics and Computer Science
[2] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Department Artificial Intelligence in Biomedical Engineering (AIBE)
来源
Data Mining and Knowledge Discovery | 2024年 / 38卷
关键词
Singular value decomposition; Federated learning; Principal component analysis; Genome-wide association studies;
D O I
暂无
中图分类号
学科分类号
摘要
Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors corresponding to the samples are never exchanged and the vectors associated with the features are only transmitted to an aggregator for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.
引用
收藏
页码:938 / 975
页数:37
相关论文
共 68 条
[1]  
Balcan MF(2016)An improved gap-dependency analysis of the noisy power method J Mach Learn Res 49 284-309
[2]  
Du SS(2013)A near-optimal algorithm for differentially-private principal components J Mach Learn Res 14 2905-2943
[3]  
Wang Y(2018)Secure genome-wide association analysis using multiparty computation Nat Biotechnol 36 547-551
[4]  
Chaudhuri K(2013)The algorithmic foundations of differential privacy Found Trends Theor Comput Sci 9 211-407
[5]  
Sarwate AD(2016)Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia Am J Hum Genet 98 456-472
[6]  
Sinha K(2019)Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure PLoS ONE 14 1-26
[7]  
Cho H(2018)Smooth sensitivity based approach for differentially private principal component analysis J Mach Learn Res 1 1-48
[8]  
Wu DJ(2012)A covariance-free iterative algorithm for distributed principal component analysis on vertically partitioned data Pattern Recognit 45 1211-1219
[9]  
Berger B(2011)Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions SIAM Rev 53 217-288
[10]  
Dwork C(2015)The MovieLens datasets: history and context ACM Trans Interact Intell Syst 14 1-210