Privacy-Preserving Split Learning for Large-Scaled Vision Pre-Training

被引:8
作者
Wang, Zhousheng [1 ]
Yang, Geng [2 ,3 ]
Dai, Hua [2 ,3 ]
Rong, Chunming [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Telecommun & Informat Engn, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China
[3] Jiangsu Key Lab Big data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
[4] Univ Stavanger, Dept Elect Engn & Comp Sci, N-4036 Stavanger, Norway
基金
中国国家自然科学基金;
关键词
Computational modeling; Training; Federated learning; Privacy; Data models; Transformers; Task analysis; Split learning; self pre-training; differential privacy; masked autoencoder; SECURITY;
D O I
10.1109/TIFS.2023.3243490
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The growing concerns about data privacy in society lead to restrictions on the computer vision research gradually. Several collaboration-based vision learning methods have recently emerged, e.g., federated learning and split learning. These methods protect user data from leaving local devices, and make training performed only by uploading gradients, parameters, or activations, etc. However, there is little research on collaborative learning based on state-of-the-art and large-scaled models, mainly due to the high computation or communication overheads of the latest models. Training these models may be still unrealized for users' terminals. In this paper, we make a first attempt at the sensitive image pre-training with large-scaled models in the collaborative learning scenario, and propose a new lightweight framework for split learning based on mask, Masked Split Learning (MaskSL). We further ensure its security by differential privacy. Besides, we model the computation and communication overheads of several collaborative learning approaches by deduction to illustrate advantages of our scheme. Finally, we design and conduct a series of experiments on real-world datasets, e.g., in face recognition and medical image classification tasks, to demonstrate the performance of MaskSL.
引用
收藏
页码:1539 / 1553
页数:15
相关论文
共 49 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]  
Abuadbba Sharif, 2020, ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, P305, DOI 10.1145/3320269.3384740
[3]  
Aggarwal D., 2021, PROC IEEE INT JOINT, P1
[4]  
[Anonymous], 2018, Split learning for health: Distributed deep learning without sharing raw patient data
[5]  
Ba JL, 2016, Layer normalization
[6]  
Bagdasaryan E, 2020, PR MACH LEARN RES, V108, P2938
[7]  
Bao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254
[8]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[9]   AUDIO ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF AUDIO REPRESENTATION [J].
Chi, Po-Han ;
Chung, Pei-Hung ;
Wu, Tsung-Han ;
Hsieh, Chun-Cheng ;
Chen, Yen-Hao ;
Li, Shang-Wen ;
Lee, Hung-yi .
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :344-350
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171