Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing

被引:0
作者
Shamsinejad E. [1 ]
Banirostam T. [1 ]
Pedram M.M. [2 ]
Rahmani A.M. [3 ]
机构
[1] Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran
[2] Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran
[3] Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran
关键词
Anonymity; Big data; Confidentiality; Data disclosure; Privacy;
D O I
10.1007/s40745-024-00556-x
中图分类号
学科分类号
摘要
In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:223 / 252
页数:29
相关论文
共 39 条
  • [11] Shamsinejad E., Banirostam T., Pedram M.M., Rahmani A.M., Presenting a model of data anonymization in big data in the context of in-memory processing, J Electr Comput Eng Innov (JECEI), 12, 1, pp. 79-98, (2024)
  • [12] Silva H., Basso T., Moraes R., Elia D., Fior S., A re-identification risk-based anonymization framework for data analytics platforms, EDCC, (2018)
  • [13] Banirostam H., Banirostam T., Pedram M.M., Et al., A model to detect the fraud of electronic payment card transactions based on stream processing in big data, J Signal Process Syst, 23, 1, pp. 1-16, (2023)
  • [14] Shamsinezhad E., Shahbahrami A., Hedayati A., Khadem Zadeh A., Banirostam H., Presentation methods for task migration in cloud computing by combination of Yu router and post-copy, Int J Comput Sci Issues (IJCSI), 10, pp. 98-102, (2013)
  • [15] Banirostam H., Banirostam T., Pedram M.M., Rahmani A.M., Providing and evaluating a comprehensive model for detecting fraudulent electronic payment card transactions with a two-level filter based on flow processing in big data, Int. j. inf. tecnol, 15, pp. 4161-4166, (2023)
  • [16] Batko K., Slezak A., The use of big data analytics in healthcare, J Big Data, (2022)
  • [17] Yang S., Li J., Cai J., Guo K., Gao X., Meng F., Data-oriented method to big data standard system creation: a case of Chinese financial industry, Ann Data Sci, 1, pp. 325-338, (2014)
  • [18] Andrew J., Karthikeyan J., Privacy-preserving big data publication: (K, L) anonymity, Adv Intell Syst Computing, (2020)
  • [19] Banirostam H., Shamsinezhad E., Banirostam T., Functional control of users by biometric behavior features in cloud computing, In: Proceedings of the 2013 4Th International Conference on Intelligent Systems, Modelling and Simulation, pp. 94-98, (2013)
  • [20] Banirostam H., Hedayati A., Khadem Zadeh A., Shamsinezhad E., A trust-based approach for increasing security in cloud computing infrastructure. In, : Proceedings of the Uksim 15Th International Conference on Computer Modeling and Simulation, pp. 717-721, (2013)