PrivateDL: Privacy-preserving collaborative deep learning against leakage from gradient sharing

被引:33
作者
Zhao, Qi [1 ,2 ]
Zhao, Chuan [1 ,2 ,3 ]
Cui, Shujie [4 ]
Jing, Shan [1 ,2 ]
Chen, Zhenxiang [1 ,2 ]
机构
[1] Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Peoples R China
[2] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan, Peoples R China
[3] Shandong Prov Key Lab Software Engn, Jinan, Peoples R China
[4] Imperial Coll London, London, England
基金
中国国家自然科学基金;
关键词
collaborative deep learning; gradient sharing; machine learning; privacy-preserving technique;
D O I
10.1002/int.22241
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale data training is vital to the generalization performance of deep learning (DL) models. However, collecting data directly is associated with increased risk of privacy disclosure, particularly in special fields such as healthcare, finance, and genomics. To protect training data privacy, collaborative deep learning (CDL) has been proposed to enable joint training from multiple data owners while providing reliable privacy guarantee. However, recent studies have shown that CDL is vulnerable to several attacks that could reveal sensitive information about the original training data. One of the most powerful attacks benefits from the leakage from gradient sharing during collaborative training process. In this study, we present a new CDL framework, PrivateDL, to effectively protect private training data against leakage from gradient sharing. Unlike conventional training process that trains on private data directly, PrivateDL allows effective transfer of relational knowledge from sensitive data to public data in a privacy-preserving way, and enables participants to jointly learn local models based on the public data with noise-preserving labels. This way, PrivateDL establishes a privacy gap between the local models and the private datasets, thereby ensuring privacy against the attacks launched to the local models through gradient sharing. Moreover, we propose a new algorithm called Distributed Aggregation Stochastic Gradient Descent, which is designed to improve the efficiency and accuracy of CDL, especially in the asynchronous training mode. Experimental results demonstrate that PrivateDL preserves data privacy with reasonable performance overhead.
引用
收藏
页码:1262 / 1279
页数:18
相关论文
共 33 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]  
Bos Joppe W., 2013, Cryptography and Coding. 14th IMA International Conference, IMACC 2013. Proceedings: LNCS 8308, P45, DOI 10.1007/978-3-642-45239-0_4
[3]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[4]   A lightweight clustering-based approach to discover different emotional shades from social message streams [J].
Di Martino, Ferdinando ;
Senatore, Sabrina ;
Sessa, Salvatore .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2019, 34 (07) :1505-1523
[5]  
Dowlin N, 2016, PR MACH LEARN RES, V48
[6]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284
[7]   Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures [J].
Fredrikson, Matt ;
Jha, Somesh ;
Ristenpart, Thomas .
CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, :1322-1333
[8]   Fully Homomorphic Encryption Using Ideal Lattices [J].
Gentry, Craig .
STOC'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2009, :169-178
[9]  
Ghazikhani H, 2018, 2018 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), P1, DOI 10.1109/ICCKE.2018.8566534
[10]  
Hamm J, 2016, PR MACH LEARN RES, V48