Add noise to remove noise: Local differential privacy for feature selection

被引:12
作者
Alishahi, Mina [1 ]
Moghtadaiee, Vahideh [2 ]
Navidan, Hojjat [3 ]
机构
[1] Open Univ, Dept Comp Sci, Heerlen, Netherlands
[2] Shahid Beheshti Univ, Cyberspace Res Inst, Tehran, Iran
[3] Univ Tehran, Sch Elect & Comp Engn, Tehran, Iran
关键词
Feature selection; Feature ranking; Privacy preserving; Local differential privacy; Machine learning; UTILITY;
D O I
10.1016/j.cose.2022.102934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection has become significantly important for data analysis. It selects the most informative fea-tures describing the data to filter out the noise, complexity, and over-fitting caused by less relevant fea-tures. Accordingly, feature selection improves the predictors' accuracy, enables them to be trained faster and more cost-effectively, and provides a better understanding of the underlying data. While plenty of practical solutions have been proposed in the literature to identify the most discriminating features de-scribing a dataset, an understanding of feature selection over privacy-sensitive data in the absence of a trusted party is still missing. The design of such a framework is specifically important in our modern society, where each individual through accessing the Internet can play simultaneously the role of a data provider and a data-analysis beneficiary. In this study, we propose a novel feature selection framework based on Local Differential Privacy (LDP), named LDP-FS, which estimates the importance of features over securely protected data while protects the confidentiality of each individual data before leaving the user's device. The performance of LDP-FS in terms of scoring and ordering the features is assessed by investigat-ing the impact of datasets properties, privacy mechanism, privacy levels, and feature selection techniques on this framework. The accuracy of classifiers trained on the selected subset of features by LDP-FS is also presented. Our experimental results demonstrate the effectiveness and efficiency of the proposed framework. (c) 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:22
相关论文
共 50 条
[11]  
Chaudhuri K, 2011, J MACH LEARN RES, V12, P1069
[12]  
Cormode G., 2021, ARXIV
[13]  
Ding BL, 2017, ADV NEUR IN, V30
[14]  
Dougherty J., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P194
[15]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4004, P486
[16]   RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response [J].
Erlingsson, Ulfar ;
Pihur, Vasyl ;
Korolova, Aleksandra .
CCS'14: PROCEEDINGS OF THE 21ST ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2014, :1054-1067
[17]   Privacy preserving classification on local differential privacy in data centers [J].
Fan, Weibei ;
He, Jing ;
Guo, Mengjiao ;
Li, Peng ;
Han, Zhijie ;
Wang, Ruchuan .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 135 :70-82
[18]   On the privacy protection of indoor location dataset using anonymization [J].
Fathalizadeh, Amir ;
Moghtadaiee, Vahideh ;
Alishahi, Mina .
COMPUTERS & SECURITY, 2022, 117
[19]  
Gu XL, 2020, Arxiv, DOI arXiv:1911.01402
[20]  
Guyon I., 2003, Journal of Machine Learning Research, V3, P1157, DOI 10.1162/153244303322753616