Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing

被引：0

作者：

Shamsinejad E. ^{[1
]}

Banirostam T. ^{[1
]}

Pedram M.M. ^{[2
]}

Rahmani A.M. ^{[3
]}

机构：

[1] Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran

[2] Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran

[3] Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran

来源：

Annals of Data Science | 2025年 / 12卷 / 1期

关键词：

Anonymity; Big data; Confidentiality; Data disclosure; Privacy;

D O I：

10.1007/s40745-024-00556-x

中图分类号：

学科分类号：

摘要：

In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

引用

页码：223 / 252

页数：29

共 39 条

[1] Strang K.D., Sun Z., Big data paradigm: What is the status of privacy and security?, Ann Data Sci, 4, pp. 1-17, (2017)
[2] Xu Z., Shi Y., Exploring big data analysis: fundamental scientific problems, Ann Data Sci, 2, pp. 363-372, (2015)
[3] Shi Y., Advances in big data analytics: Theory, algorithm and practice, (2022)
[4] Olson D.L., Shi Y., Introduction to business data mining, (2007)
[5] Shi Y., Tian Y.J., Kou G., Peng Y., Li J.P., Optimization based data mining: theory and applications, Springer, (2011)
[6] Tien J.M., Internet of things, real-time decision making, and artificial intelligence, Ann Data Sci, 4, 2, pp. 149-178, (2017)
[7] Du D., Li A., Zhang L., Li H., Review on the applications and the handling techniques of big data in Chinese realty enterprises, Ann Data Sci, 1, pp. 339-357, (2014)
[8] Luan H., Kun H.X., Qun F., Han Z., Yang L.Y., Lin S.Q., Qing W., A survey of text summarization approaches based on deep learning, J Comput Sci Technol, 36, pp. 633-663, (2021)
[9] Jadhav P.S., Borkar G.M., Optimal key generation for privacy preservation in big data applications based on the marine predator whale optimization algorithm, Ann Data Sci, (2024)
[10] Zheng W., Ma Y., Wang Z., Jia C., Li P., Effective L-diversity anonymization algorithm based on improved clustering, Cyberspace Safety and Security. Lecture Notes in Computer Science, 11983, (2019)

← 1 2 3 4 →