Privacy-Preserving Collaborative Learning for Genome Analysis via Secure XGBoost

被引:0
作者
Aldeen, Mohammed Shujaa [1 ]
Zhao, Chuan [2 ]
Chen, Zhenxiang [1 ]
Fang, Liming [3 ]
Liu, Zhe [4 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network based Intelligent Co, Jinan 250102, Peoples R China
[2] Quan Cheng Lab, Jinan 250103, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[4] Zhejiang Lab, Hangzhou 311121, Peoples R China
基金
中国国家自然科学基金;
关键词
Bioinformatics; Training; Data models; Genomics; Cryptography; Data privacy; Computational modeling; Genome analysis; gradient descent; collaborative learning; secure XGBoost; intel; -SGX; privacy-preserving; COMPUTATION;
D O I
10.1109/TDSC.2024.3384244
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Genomic data is usually stored in a decentralized manner among data providers, who cannot share them publicly due to privacy concerns. A significant technical challenge is to combine machine learning and cryptography techniques to build secure machine learning models over distributed datasets without violating privacy. Therefore, data providers in collaborative machine learning want to maintain the privacy of their genomic data, and the researcher who owns the training model wants to keep the model and training methods confidential. This paper proposes a framework that supports secure collaborative learning tasks without disclosing the participants' genomic data and training model information simultaneously. With the help of a cluster of Intel SGX enclaves, our work performs fast distributed training over these enclaves, and a dedicated enclave is solely used for updating the global model. Also, Secure XGBoost was implemented over these hardware enclaves for fast learning and to enhance the enclaves' security with unique data-oblivious algorithms that eliminate side-channel attacks. From the experimental results, our scheme achieves fast and efficient results in collaborative learning systems without an increase in communication overhead, making it practical for large genomic data.
引用
收藏
页码:5755 / 5765
页数:11
相关论文
共 49 条
  • [1] Deep Learning with Differential Privacy
    Abadi, Martin
    Chu, Andy
    Goodfellow, Ian
    McMahan, H. Brendan
    Mironov, Ilya
    Talwar, Kunal
    Zhang, Li
    [J]. CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, : 308 - 318
  • [2] Rare Variants Analysis in Genetic Association Studies with Privacy Protection via Hybrid System
    Aldeen, Mohammed Shujaa
    Zhao, Chuan
    [J]. INFORMATION AND COMMUNICATIONS SECURITY (ICICS 2021), PT II, 2021, 12919 : 174 - 191
  • [3] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [4] [Anonymous], 2020, IBM watson health: Diagnostic imaging solutions
  • [5] Genetic interactions contribute less than additive effects to quantitative trait variation in yeast
    Bloom, Joshua S.
    Kotenko, Iulia
    Sadhu, Meru J.
    Treusch, Sebastian
    Albert, Frank W.
    Kruglyak, Leonid
    [J]. NATURE COMMUNICATIONS, 2015, 6
  • [6] Bogdanov D, 2008, LECT NOTES COMPUT SC, V5283, P192
  • [7] Privacy-Preserving GWAS Computation on Outsourced Data Encrypted under Multiple Keys Through Hybrid System
    Bomai, Abubakar
    Aldeen, Mohammed Shujaa
    Zhao, Chuan
    [J]. 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 683 - 691
  • [8] Brasser F., 2017, 11 USENIX WORKSH OFF
  • [9] SGXPECTRE: Stealing Intel Secrets from SGX Enclaves via Speculative Execution
    Chen, Guoxing
    Chen, Sanchuan
    Xiao, Yuan
    Zhang, Yinqian
    Lin, Zhiqiang
    Lai, Ten H.
    [J]. 2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P), 2019, : 142 - 157
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794