Privacy-Preserving Collaborative Learning for Genome Analysis via Secure XGBoost

被引:0
作者
Aldeen, Mohammed Shujaa [1 ]
Zhao, Chuan [2 ]
Chen, Zhenxiang [1 ]
Fang, Liming [3 ]
Liu, Zhe [4 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network based Intelligent Co, Jinan 250102, Peoples R China
[2] Quan Cheng Lab, Jinan 250103, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[4] Zhejiang Lab, Hangzhou 311121, Peoples R China
基金
中国国家自然科学基金;
关键词
Bioinformatics; Training; Data models; Genomics; Cryptography; Data privacy; Computational modeling; Genome analysis; gradient descent; collaborative learning; secure XGBoost; intel; -SGX; privacy-preserving; COMPUTATION;
D O I
10.1109/TDSC.2024.3384244
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Genomic data is usually stored in a decentralized manner among data providers, who cannot share them publicly due to privacy concerns. A significant technical challenge is to combine machine learning and cryptography techniques to build secure machine learning models over distributed datasets without violating privacy. Therefore, data providers in collaborative machine learning want to maintain the privacy of their genomic data, and the researcher who owns the training model wants to keep the model and training methods confidential. This paper proposes a framework that supports secure collaborative learning tasks without disclosing the participants' genomic data and training model information simultaneously. With the help of a cluster of Intel SGX enclaves, our work performs fast distributed training over these enclaves, and a dedicated enclave is solely used for updating the global model. Also, Secure XGBoost was implemented over these hardware enclaves for fast learning and to enhance the enclaves' security with unique data-oblivious algorithms that eliminate side-channel attacks. From the experimental results, our scheme achieves fast and efficient results in collaborative learning systems without an increase in communication overhead, making it practical for large genomic data.
引用
收藏
页码:5755 / 5765
页数:11
相关论文
共 49 条
  • [41] Wetterstrand K.A., DNA SEQUENCING COSTS
  • [42] Comparison among dimensionality reduction techniques based on Random Projection for cancer classification
    Xie, Haozhe
    Li, Jie
    Zhang, Qiaosheng
    Wang, Yadong
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 65 : 165 - 172
  • [43] Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems
    Xu, Yuanzhong
    Cui, Weidong
    Peinado, Marcus
    [J]. 2015 IEEE SYMPOSIUM ON SECURITY AND PRIVACY SP 2015, 2015, : 640 - 656
  • [44] Yu J., 2018, Paper 2018/808
  • [45] Zhang YL, 2017, PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), P19, DOI [10.1109/SP.2017.12, 10.1145/3132747.3132768]
  • [46] Secure Multi-Party Computation: Theory, practice and applications
    Zhao, Chuan
    Zhao, Shengnan
    Zhao, Minghao
    Chen, Zhenxiang
    Gao, Chong-Zhi
    Li, Hongwei
    Tan, Yu-an
    [J]. INFORMATION SCIENCES, 2019, 476 : 357 - 372
  • [47] PrivateDL: Privacy-preserving collaborative deep learning against leakage from gradient sharing
    Zhao, Qi
    Zhao, Chuan
    Cui, Shujie
    Jing, Shan
    Chen, Zhenxiang
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (08) : 1262 - 1279
  • [48] Zhu L., 2019, ADV NEURAL INFORM PR, V32
  • [49] Zhu LG, 2019, ADV NEUR IN, V32