Efficient approximation and privacy preservation algorithms for real time online evolving data streams

被引:4
作者
Patil, Rahul A. [1 ,2 ]
Patil, Pramod D. [1 ]
机构
[1] Dr D Y Patil Inst Technol, Pimpri Pune 411018, Maharashtra, India
[2] Pimpri Chinchwad Coll Engn, Pune 411044, Maharashtra, India
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2024年 / 27卷 / 01期
关键词
Approximation; Data streaming; Clustering; k-anonymization; l-diversity; Privacy preservation; ANONYMIZATION;
D O I
10.1007/s11280-024-01244-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the processing of continuous unstructured large streams of data, mining real-time streaming data is a more challenging research issue than mining static data. The privacy issue persists when sensitive data is included in streaming data. In recent years, there has been significant progress in research on the anonymization of static data. For the anonymization of quasi-identifiers, two typical strategies are generalization and suppression. However, the high dynamicity and potential infinite properties of the streaming data make it a challenging task. To end this, we propose a novel Efficient Approximation and Privacy Preservation Algorithms (EAPPA) framework in this paper to achieve efficient data pre-processing from the live streaming and its privacy preservation with minimum Information Loss (IL) and computational requirements. As the existing privacy preservation solutions for streaming data suffer from the challenges of redundant data, we first propose the efficient technique of data approximation with data pre-processing. We design the Flajolet Martin (FM) algorithm for robust and efficient approximation of unique elements in the data stream with a data cleaning mechanism. We fed the periodically approximated and pre-processed streaming data to the anonymization algorithm. Using adaptive clustering, we propose innovative k-anonymization and l-diversity privacy principles for data streams. The proposed approach scans a stream to detect and reuse clusters that fulfill the k-anonymity and l-diversity criteria for reducing anonymization time and IL. The experimental results reveal the efficiency of the EAPPA framework compared to state-of-art methods.
引用
收藏
页数:20
相关论文
共 33 条
  • [1] Efficient approximation and privacy preservation algorithms for real time online evolving data streams
    Rahul A. Patil
    Pramod D. Patil
    World Wide Web, 2024, 27
  • [2] Online Sparse Representation Clustering for Evolving Data Streams
    Chen, Jie
    Yang, Shengxiang
    Fahy, Conor
    Wang, Zhu
    Guo, Yinan
    Chen, Yingke
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 525 - 539
  • [3] A COMPLETE PRIVACY PRESERVATION SYSTEM FOR DATA MINING USING FUNCTION APPROXIMATION
    Rajalakshmi, V.
    Lakshmi, M.
    Anu, V. Maria
    JOURNAL OF WEB ENGINEERING, 2017, 16 (3-4): : 277 - 292
  • [4] Efficient approximation of correlated sums on data streams
    Ananthakrishna, R
    Das, A
    Gehrke, J
    Korn, F
    Muthukrishnan, S
    Srivastava, D
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) : 569 - 572
  • [5] Fully online clustering of evolving data streams into arbitrarily shaped clusters
    Hyde, Richard
    Angelov, Plamen
    MacKenzie, A. R.
    INFORMATION SCIENCES, 2017, 382 : 96 - 114
  • [6] An efficient privacy-preservation algorithm for incremental data publishing
    Soontornphand, Torsak
    Iwaihara, Mizuho
    Natwichai, Juggapong
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2023, 14 (06) : 562 - 582
  • [7] PUBLISHING SENSITIVE TIME-SERIES DATA UNDER PRESERVATION OF PRIVACY AND DISTANCE ORDERS
    Choi, Mi-Jung
    Kim, Hea-Suk
    Moon, Yang-Sae
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (5B): : 3619 - 3638
  • [8] Slicing based efficient privacy preservation technique with multiple sensitive attributes for safe data distribution
    Murugaboopathi, G.
    Gowthami, V
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2661 - 2668
  • [9] Secure and efficient sharing of authenticated energy usage data with privacy preservation
    Liu, Jianghua
    Hou, Jingyu
    Huang, Xinyi
    Xiang, Yang
    Zhu, Tianqing
    COMPUTERS & SECURITY, 2020, 92
  • [10] Online Clustering of Evolving Data Streams Using a Density Grid-Based Method
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Mohd, Masnizah
    Sani, Nor Samsiah
    IEEE ACCESS, 2020, 8 : 166472 - 166490