Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing

被引：0

作者：

Shamsinejad E. ^{[1
]}

Banirostam T. ^{[1
]}

Pedram M.M. ^{[2
]}

Rahmani A.M. ^{[3
]}

机构：

[1] Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran

[2] Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran

[3] Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran

来源：

Annals of Data Science | 2025年 / 12卷 / 1期

关键词：

Anonymity; Big data; Confidentiality; Data disclosure; Privacy;

D O I：

10.1007/s40745-024-00556-x

中图分类号：

学科分类号：

摘要：

In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

引用

页码：223 / 252

页数：29

共 39 条

[11] Shamsinejad E., Banirostam T., Pedram M.M., Rahmani A.M., Presenting a model of data anonymization in big data in the context of in-memory processing, J Electr Comput Eng Innov (JECEI), 12, 1, pp. 79-98, (2024)
[12] Silva H., Basso T., Moraes R., Elia D., Fior S., A re-identification risk-based anonymization framework for data analytics platforms, EDCC, (2018)
[13] Banirostam H., Banirostam T., Pedram M.M., Et al., A model to detect the fraud of electronic payment card transactions based on stream processing in big data, J Signal Process Syst, 23, 1, pp. 1-16, (2023)
[14] Shamsinezhad E., Shahbahrami A., Hedayati A., Khadem Zadeh A., Banirostam H., Presentation methods for task migration in cloud computing by combination of Yu router and post-copy, Int J Comput Sci Issues (IJCSI), 10, pp. 98-102, (2013)
[15] Banirostam H., Banirostam T., Pedram M.M., Rahmani A.M., Providing and evaluating a comprehensive model for detecting fraudulent electronic payment card transactions with a two-level filter based on flow processing in big data, Int. j. inf. tecnol, 15, pp. 4161-4166, (2023)
[16] Batko K., Slezak A., The use of big data analytics in healthcare, J Big Data, (2022)
[17] Yang S., Li J., Cai J., Guo K., Gao X., Meng F., Data-oriented method to big data standard system creation: a case of Chinese financial industry, Ann Data Sci, 1, pp. 325-338, (2014)
[18] Andrew J., Karthikeyan J., Privacy-preserving big data publication: (K, L) anonymity, Adv Intell Syst Computing, (2020)
[19] Banirostam H., Shamsinezhad E., Banirostam T., Functional control of users by biometric behavior features in cloud computing, In: Proceedings of the 2013 4Th International Conference on Intelligent Systems, Modelling and Simulation, pp. 94-98, (2013)
[20] Banirostam H., Hedayati A., Khadem Zadeh A., Shamsinezhad E., A trust-based approach for increasing security in cloud computing infrastructure. In, : Proceedings of the Uksim 15Th International Conference on Computer Modeling and Simulation, pp. 717-721, (2013)

← 1 2 3 4 →