Research Progress on Privacy-Preserving Techniques in Big Data Computing Environment

被引:0
作者
Qian W.-J. [1 ,2 ,3 ]
Shen Q.-N. [1 ,2 ,3 ]
Wu P.-F. [1 ,2 ,3 ]
Dong C.-T. [1 ,2 ,3 ]
Wu Z.-H. [1 ,2 ,3 ]
机构
[1] School of Software and Microelectronics, Peking University, Beijing
[2] National Engineering Research Center for Software Engineering, Peking University, Beijing
[3] Key Laboratory of High Confidence Software Technologies of Ministry of Education, Peking University, Beijing
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 04期
基金
中国国家自然科学基金;
关键词
Access pattern hiding; Big data privacy protection; Data interference; Data separation; Hardware enhancement; Secure multi-party computation;
D O I
10.11897/SP.J.1016.2022.00669
中图分类号
学科分类号
摘要
The widespread deployment and applications of distributed big data computing environment, such as batch processing, stream computing, and machine learning in cloud, have brought great convenience to users for efficiently processing massive amounts of data, but the privacy issues caused by data breaches are becoming more and more serious.How to protect private data in such a big data computing environment deployed on the cloud has become a research hotspot.This paper provides a comprehensive overview of the latest research achievements and progress of big data privacy protection in this field, mainly including domestic and foreign research work in recent years.Firstly, the participating roles and application scenarios in the above-mentioned big data computing environment are introduced.Combining the adversary models of different roles, we start from the three links of data input, computation, and output involved in the distributed computing process, and divide the existing privacy issues into three categories: privacy leakage of native individual data occurs during the data input stage, private data is stolen by attackers inside the cloud during the data computation process, and sensitive information is maliciously inferred by untrusted data consumers (i.e.users who use cloud computing platform and pay for cloud service form cloud providers) during the data output stage.Secondly, we summarize the corresponding five main research directions based on the possible privacy leakage risks under conditions such as plaintext, ciphertext, or trusted hardware protection, including privacy protection based on data separation, privacy protection based on data interference, privacy protection based on secure multi-party computation, privacy protection based on hardware enhancement, and privacy protection based on access pattern hiding.For each type of privacy protection scheme, privacy challenges, adversary model, privacy issues, mainstream privacy-preserving techniques, and existing limitations are sorted out and analyzed.Furthermore, the advantages and disadvantages of existing privacy-preserving techniques are compared in terms of privacy, utility, and performance.Specifically, these techniques have different characteristics and limitations, and are suitable for different application scenarios.In order to protect individual privacy in the data input stage, previous work adopt many effective techniques such as data separation, data anonymization, and local differential privacy.Besides, in order to ensure the confidentiality and privacy of sensitive data involved in the computing process, the existing mainstreaming privacy-preserving solutions are based on secure multi-party computation, hardware enhancement, and access pattern hiding, including main techniques such as garbled circuit, secret sharing, homomorphic encryption, Intel software guard extensions, oblivious random access machine and oblivious shuffle.In addition, it should be noted that privacy leakage may occur during the data output stage.Attackers outside the cloud can use their known background knowledge to analyze the output of big data computing, and then obtain sensitive information that can be traced back to a specific individual, and consequently steal privacy of the original input data.In order to defend against such attackers, it is effective to adopt data anonymization or differential privacy technique.Finally, the future research trends of privacy-preserving techniques in big data computing environment are prospected at the end of this paper. © 2022, Science Press. All right reserved.
引用
收藏
页码:669 / 701
页数:32
相关论文
共 134 条
[1]  
Dean J, Ghemawat S., MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51, 1, pp. 107-113, (2008)
[2]  
Zaharia M, Das T, Li H, Et al., Discretized streams: Fault-tolerant streaming computation at scale, Proceedings of the 24th ACM Symposium on Operating Systems Principles, pp. 423-438, (2013)
[3]  
Carbone P, Katsifodimos A, Ewen S, Et al., Apache flink: Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36, 4, pp. 28-38, (2015)
[4]  
Abadi M, Agarwal A, Barham P, Et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, (2016)
[5]  
Dinh T T A, Saxena P, Chang E C, Et al., M2R: Enabling stronger privacy in mapreduce computation, Proceedings of the 24th USENIX Security Symposium, pp. 447-462, (2015)
[6]  
Schuster F, Costa M, Fournet C, Et al., VC3: Trustworthy data analytics in the cloud using SGX, Proceedings of the 36th IEEE Symposium on Security and Privacy, pp. 38-54, (2015)
[7]  
Xu Y, Cui W, Peinado M., Controlled-channel attacks: Deterministic side channels for untrusted operating systems, Proceedings of the 36th IEEE Symposium on Security and Privacy, pp. 640-656, (2015)
[8]  
Ohrimenko O, Costa M, Fournet C, Et al., Observing and preventing leakage in mapreduce, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1570-1581, (2015)
[9]  
Zhou Shui-Geng, Li Feng, Tao Yu-Fei, Xiao Xiao-Kui, Privacy preservation in database applications: A survey, Chinese Journal of Computers, 32, 5, pp. 847-861, (2009)
[10]  
Zhang Xiao-Jian, Meng Xiao-Feng, Differential privacy in data publication and analysis, Chinese Journal of Computers, 37, 4, pp. 927-949, (2014)