Differentially Private Feature Selection for Data Mining

被引:4
作者
Anandan, Balamurugan [1 ,2 ]
Clifton, Chris [1 ,2 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
来源
IWSPA '18: PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS | 2018年
关键词
Differential privacy; sensitivity; data mining; classification; decision trees; naive bayes; feature selection; privacy preserving data mining;
D O I
10.1145/3180445.3180452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One approach to analysis of private data is epsilon-differential privacy, a randomization-based approach that protects individual data items by injecting carefully limited noise into results. A challenge in applying this to private data analysis is that the noise added to the feature parameters is directly proportional to the number of parameters learned. While careful feature selection would alleviate this problem, the process of feature selection itself can reveal private information, requiring the application of differential privacy to the feature selection process. In this paper, we analyze the sensitivity of various feature selection techniques used in data mining and show that some of them are not suitable for differentially private analysis due to high sensitivity. We give experimental results showing the value of using low sensitivity feature selection techniques. We also show that the same concepts can be used to improve differentially private decision trees.
引用
收藏
页码:43 / 53
页数:11
相关论文
共 50 条
  • [31] The use of feature selection based data mining methods in biomarkers identification of disease
    Zhao, Huihui
    Chen, Jianxin
    Liu, Y.
    Shi, Qi
    Yang, Yi
    Zheng, Chenglong
    Hou, Na
    Wang, Juan
    Zhao, Lingyan
    Wang, Wei
    CEIS 2011, 2011, 15
  • [32] Benchmarking relief-based feature selection methods for bioinformatics data mining
    Urbanowicz, Ryan J.
    Olson, Randal S.
    Schmit, Peter
    Meeker, Melissa
    Moore, Jason H.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 85 : 168 - 188
  • [33] A GA-Based Wrapper Feature Selection for Animal Breeding Data Mining
    Unold, Olgierd
    Dobrowolski, Maciej
    Maciejewski, Henryk
    Skrobanek, Pawel
    Walkowicz, Ewa
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 200 - 209
  • [34] Comparison of data mining algorithms in remote sensing using Lidar data fusion and feature selection
    Rozario, Papia
    Gomes, Rahul
    2021 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2021, : 236 - 243
  • [35] Selection and Verification of Privacy Parameters for Local Differentially Private Data Aggregation
    Shahani, Snehkumar
    Abraham, Jibi
    Venkateswaran, R.
    5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 84 - 89
  • [36] A differentially private distributed data mining scheme with high efficiency for edge computing
    Sun, Xianwen
    Xu, Ruzhi
    Wu, Longfei
    Guan, Zhitao
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2021, 10 (01):
  • [37] A differentially private distributed data mining scheme with high efficiency for edge computing
    Xianwen Sun
    Ruzhi Xu
    Longfei Wu
    Zhitao Guan
    Journal of Cloud Computing, 10
  • [38] Indonesian Islamic moral incentives in credit card debt repayment: A feature selection using various data mining
    Caraka, Rezzy Eko
    Hudaefi, Fahmi Ali
    Ugiana, Prana
    Toharudin, Toni
    Tyasti, Avia Enggar
    Goldameir, Noor Ell
    Chen, Rung Ching
    INTERNATIONAL JOURNAL OF ISLAMIC AND MIDDLE EASTERN FINANCE AND MANAGEMENT, 2022, 15 (01) : 100 - 124
  • [39] PrivPfC: differentially private data publication for classification
    Dong Su
    Jianneng Cao
    Ninghui Li
    Min Lyu
    The VLDB Journal, 2018, 27 : 201 - 223
  • [40] PrivPfC: differentially private data publication for classification
    Su, Dong
    Cao, Jianneng
    Li, Ninghui
    Lyu, Min
    VLDB JOURNAL, 2018, 27 (02) : 201 - 223