Add noise to remove noise: Local differential privacy for feature selection

被引:12
作者
Alishahi, Mina [1 ]
Moghtadaiee, Vahideh [2 ]
Navidan, Hojjat [3 ]
机构
[1] Open Univ, Dept Comp Sci, Heerlen, Netherlands
[2] Shahid Beheshti Univ, Cyberspace Res Inst, Tehran, Iran
[3] Univ Tehran, Sch Elect & Comp Engn, Tehran, Iran
关键词
Feature selection; Feature ranking; Privacy preserving; Local differential privacy; Machine learning; UTILITY;
D O I
10.1016/j.cose.2022.102934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection has become significantly important for data analysis. It selects the most informative fea-tures describing the data to filter out the noise, complexity, and over-fitting caused by less relevant fea-tures. Accordingly, feature selection improves the predictors' accuracy, enables them to be trained faster and more cost-effectively, and provides a better understanding of the underlying data. While plenty of practical solutions have been proposed in the literature to identify the most discriminating features de-scribing a dataset, an understanding of feature selection over privacy-sensitive data in the absence of a trusted party is still missing. The design of such a framework is specifically important in our modern society, where each individual through accessing the Internet can play simultaneously the role of a data provider and a data-analysis beneficiary. In this study, we propose a novel feature selection framework based on Local Differential Privacy (LDP), named LDP-FS, which estimates the importance of features over securely protected data while protects the confidentiality of each individual data before leaving the user's device. The performance of LDP-FS in terms of scoring and ordering the features is assessed by investigat-ing the impact of datasets properties, privacy mechanism, privacy levels, and feature selection techniques on this framework. The accuracy of classifiers trained on the selected subset of features by LDP-FS is also presented. Our experimental results demonstrate the effectiveness and efficiency of the proposed framework. (c) 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:22
相关论文
共 50 条
[1]  
Acharya J, 2018, Arxiv, DOI arXiv:1802.04705
[2]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P457
[3]  
Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
[4]   Differentially Private Feature Selection for Data Mining [J].
Anandan, Balamurugan ;
Clifton, Chris .
IWSPA '18: PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS, 2018, :43-53
[5]  
[Anonymous], 2017, APPL MACH LEARN RES
[6]   Local Differential Privacy for Deep Learning [J].
Arachchige, Pathum Chamikara Mahawaga ;
Bertok, Peter ;
Khalil, Ibrahim ;
Liu, Dongxi ;
Camtepe, Seyit ;
Atiquzzaman, Mohammed .
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07) :5827-5842
[7]  
Banerjee M., 2011, P 20 ACM INT C INF K, P2281, DOI DOI 10.1145/2063576.2063946
[8]  
Bebensee B., 2019, arXiv
[9]  
Berrett T, 2019, Arxiv, DOI arXiv:1912.04629
[10]   Benchmark for filter methods for feature selection in high-dimensional classification data [J].
Bommert, Andrea ;
Sun, Xudong ;
Bischl, Bernd ;
Rahnenfuehrer, Joerg ;
Lang, Michel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143