Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

被引:16
作者
Hewage, U. H. W. A. [1 ]
Sinha, R. [1 ]
Naeem, M. Asif [2 ]
机构
[1] Auckland Univ Technol, Sch Engn Comp & Math Sci, Auckland 1010, New Zealand
[2] Natl Univ Comp & Emerging Sci, Dept Comp Sci, Islamabad, Pakistan
关键词
Privacy-preserving data mining; Data streams; Accuracy-privacy trade-off; Data privacy; DATA PERTURBATION; DATA DISTORTION; PRESERVATION; UTILITY; ROTATION; STRATEGY;
D O I
10.1007/s10462-023-10425-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigates existing input privacy-preserving data mining (PPDM) methods and privacy-preserving data stream mining methods (PPDSM), including their strengths and weaknesses. A further analysis was carried out to determine to what extent existing PPDM/PPDSM methods address the trade-off between data mining accuracy and data privacy which is a significant concern in the area. The systematic literature review was conducted using data extracted from 104 primary studies from 5 reputed databases. The scope of the study was defined using three research questions and adequate inclusion and exclusion criteria. According to the results of our study, we divided existing PPDM methods into four categories: perturbation, non-perturbation, secure multi-party computation, and combinations of PPDM methods. These methods have different strengths and weaknesses concerning the accuracy, privacy, time consumption, and more. Data stream mining must face additional challenges such as high volume, high speed, and computational complexity. The techniques proposed for PPDSM are less in number than the PPDM. We categorized PPDSM techniques into three categories (perturbation, non-perturbation, and other). Most PPDM methods can be applied to classification, followed by clustering and association rule mining. It was observed that numerous studies have identified and discussed the accuracy-privacy trade-off. However, there is a lack of studies providing solutions to the issue, especially in PPDSM.
引用
收藏
页码:10427 / 10464
页数:38
相关论文
共 129 条
[1]  
Aggarwal CC, 2004, LECT NOTES COMPUT SC, V2992, P183
[2]   On static and dynamic methods for condensation-based privacy-preserving data mining [J].
Aggarwal, Charu C. ;
Yu, Philip S. .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (01)
[3]  
Aggarwal CC, 2008, ADV DATABASE SYST, V34, P11
[4]  
Agrawal S, 2005, PROC INT CONF DATA, P193
[5]   Optimal Accuracy-Privacy Trade-Off for Secure Computations [J].
Ah-Fat, Patrick ;
Huth, Michael .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) :3165-3182
[6]   A comprehensive review on privacy preserving data mining [J].
Aldeen, Yousra Abdul Alsahib S. ;
Salleh, Mazleena ;
Razzaque, Mohammad Abdur .
SPRINGERPLUS, 2015, 4 :1-36
[7]   Non-linear Dimensionality Reduction for Privacy-Preserving Data Classification [J].
Alotaibi, Khaled ;
Rayward-Smith, V. J. ;
Wang, Wenjia ;
de la Iglesia, Beatriz .
PROCEEDINGS OF 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY, RISK AND TRUST AND 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM/PASSAT 2012), 2012, :694-701
[8]  
Arumugam G, 2016, ACM INT C PROCEEDING, DOI [10.1145/2925995.2926005, DOI 10.1145/2925995.2926005]
[9]  
Ashok V, 2011, P 10 ANN ACM WORKSH, P159, DOI [10.1145/2046556.2046578, DOI 10.1145/2046556.2046578]
[10]  
Babu KS, 2011, COMM COM INF SC, V191, P1