SecureNLP: A System for Multi-Party Privacy-Preserving Natural Language Processing

被引:60
作者
Feng, Qi [1 ,2 ]
He, Debiao [1 ,2 ]
Liu, Zhe [3 ]
Wang, Huaqun [4 ]
Choo, Kim-Kwang Raymond [5 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Minist Educ, Key Lab Aerosp Informat Secur & Trusted Comp, Wuhan 430072, Peoples R China
[2] State Key Lab Cryptol, Beijing 100878, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Coll Comp, Nanjing 210003, Peoples R China
[5] Univ Texas San Antonio, Dept Informat Syst & Cyber Secur, San Antonio, TX 78249 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Secure multi-party computation; natural language processing; seq2seq with attention; long short-term memory;
D O I
10.1109/TIFS.2020.2997134
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Natural language processing (NLP) allows a computer program to understand human language as it is spoken, and has been increasingly deployed in a growing number of applications, such as machine translation, sentiment analysis, and electronic voice assistant. While information obtained from different sources can enhance the accuracy of NLP models, there are also privacy implications in the collection of such massive data. Thus, in this paper, we design a privacy-preserving system SecureNLP, focusing on the instance of recurrent neural network (RNN)based sequence-to-sequence with attention model for neural machine translation. Specifically, for non-linear functions such as sigmoid and tanh, we design two efficient distributed protocols using secure multi-party computation (MPC), which are used to carry out the respective tasks in the SecureNLP. We also prove the security of these two protocols (i.e., privacy-preserving long short-term memory network PrivLSTM, and privacy-preserving sequence to sequence transformation PrivSEQ2SEQ) in the semi-honest adversary model, in the sense that any honest-but-curious adversary cannot learn anything else from the messages they receive from other parties. The proposed system is implemented in C++ and Python, and the findings from the evaluation demonstrate the utility of the protocols in cross-domain NLP.
引用
收藏
页码:3709 / 3721
页数:13
相关论文
共 46 条
[1]   A Survey on Homomorphic Encryption Schemes: Theory and Implementation [J].
Acar, Abbas ;
Aksu, Hidayet ;
Uluagac, A. Selcuk ;
Conti, Mauro .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[2]  
Algesheimer J, 2002, LECT NOTES COMPUT SC, V2442, P417
[3]  
[Anonymous], 2014, Advances in Neural Information Processing Systems
[4]  
[Anonymous], 2020, MXNET FLEXIBLE EFFIC
[5]  
BEAVER D, 1992, LECT NOTES COMPUT SC, V576, P420
[6]  
Bogdanov D, 2008, LECT NOTES COMPUT SC, V5283, P192
[7]   Practical Secure Aggregation for Privacy-Preserving Machine Learning [J].
Bonawitz, Keith ;
Ivanov, Vladimir ;
Kreuter, Ben ;
Marcedone, Antonio ;
McMahan, H. Brendan ;
Patel, Sarvar ;
Ramage, Daniel ;
Segal, Aaron ;
Seth, Karn .
CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, :1175-1191
[8]   Universally composable security: A new paradigm for cryptographic protocols [J].
Canetti, R .
42ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2001, :136-145
[9]   A Simpler Variant of Universally Composable Security for Standard Multiparty Computation [J].
Canetti, Ran ;
Cohen, Asaf ;
Lindell, Yehuda .
ADVANCES IN CRYPTOLOGY, PT II, 2015, 9216 :3-22
[10]  
Catrina O, 2018, IEEE ICC, P431, DOI 10.1109/ICComm.2018.8453648