Privacy-Preserving Collaborative Learning for Genome Analysis via Secure XGBoost

被引：0

作者：

Aldeen, Mohammed Shujaa ^{[1
]}

Zhao, Chuan ^{[2
]}

Chen, Zhenxiang ^{[1
]}

Fang, Liming ^{[3
]}

Liu, Zhe ^{[4
]}

机构：

[1] Univ Jinan, Shandong Prov Key Lab Network based Intelligent Co, Jinan 250102, Peoples R China

[2] Quan Cheng Lab, Jinan 250103, Peoples R China

[3] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China

[4] Zhejiang Lab, Hangzhou 311121, Peoples R China

来源：

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING | 2024年 / 21卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Bioinformatics; Training; Data models; Genomics; Cryptography; Data privacy; Computational modeling; Genome analysis; gradient descent; collaborative learning; secure XGBoost; intel; -SGX; privacy-preserving; COMPUTATION;

D O I：

10.1109/TDSC.2024.3384244

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Genomic data is usually stored in a decentralized manner among data providers, who cannot share them publicly due to privacy concerns. A significant technical challenge is to combine machine learning and cryptography techniques to build secure machine learning models over distributed datasets without violating privacy. Therefore, data providers in collaborative machine learning want to maintain the privacy of their genomic data, and the researcher who owns the training model wants to keep the model and training methods confidential. This paper proposes a framework that supports secure collaborative learning tasks without disclosing the participants' genomic data and training model information simultaneously. With the help of a cluster of Intel SGX enclaves, our work performs fast distributed training over these enclaves, and a dedicated enclave is solely used for updating the global model. Also, Secure XGBoost was implemented over these hardware enclaves for fast learning and to enhance the enclaves' security with unique data-oblivious algorithms that eliminate side-channel attacks. From the experimental results, our scheme achieves fast and efficient results in collaborative learning systems without an increase in communication overhead, making it practical for large genomic data.

引用

页码：5755 / 5765

页数：11

共 49 条

[41] Wetterstrand K.A., DNA SEQUENCING COSTS
[42] Comparison among dimensionality reduction techniques based on Random Projection for cancer classification
Xie, Haozhe
Li, Jie
Zhang, Qiaosheng
Wang, Yadong
[J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 65 : 165 - 172
[43] Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems
Xu, Yuanzhong
Cui, Weidong
Peinado, Marcus
[J]. 2015 IEEE SYMPOSIUM ON SECURITY AND PRIVACY SP 2015, 2015, : 640 - 656
[44] Yu J., 2018, Paper 2018/808
[45] Zhang YL, 2017, PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), P19, DOI [10.1109/SP.2017.12, 10.1145/3132747.3132768]
[46] Secure Multi-Party Computation: Theory, practice and applications
Zhao, Chuan
Zhao, Shengnan
Zhao, Minghao
Chen, Zhenxiang
Gao, Chong-Zhi
Li, Hongwei
Tan, Yu-an
[J]. INFORMATION SCIENCES, 2019, 476 : 357 - 372
[47] PrivateDL: Privacy-preserving collaborative deep learning against leakage from gradient sharing
Zhao, Qi
Zhao, Chuan
Cui, Shujie
Jing, Shan
Chen, Zhenxiang
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (08) : 1262 - 1279
[48] Zhu L., 2019, ADV NEURAL INFORM PR, V32
[49] Zhu LG, 2019, ADV NEUR IN, V32

← 1 2 3 4 5 →