Differentially Private Feature Selection for Data Mining

被引:4
作者
Anandan, Balamurugan [1 ,2 ]
Clifton, Chris [1 ,2 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
来源
IWSPA '18: PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS | 2018年
关键词
Differential privacy; sensitivity; data mining; classification; decision trees; naive bayes; feature selection; privacy preserving data mining;
D O I
10.1145/3180445.3180452
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One approach to analysis of private data is epsilon-differential privacy, a randomization-based approach that protects individual data items by injecting carefully limited noise into results. A challenge in applying this to private data analysis is that the noise added to the feature parameters is directly proportional to the number of parameters learned. While careful feature selection would alleviate this problem, the process of feature selection itself can reveal private information, requiring the application of differential privacy to the feature selection process. In this paper, we analyze the sensitivity of various feature selection techniques used in data mining and show that some of them are not suitable for differentially private analysis due to high sensitivity. We give experimental results showing the value of using low sensitivity feature selection techniques. We also show that the same concepts can be used to improve differentially private decision trees.
引用
收藏
页码:43 / 53
页数:11
相关论文
共 50 条
  • [41] Classification of High Dimensional Data Using Filtration Attribute Evaluation Feature Selection Method of Data mining
    Veeraswamy, Ammisetty
    Babu, Ammisetty Mahesh
    2019 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2019, : 8 - 12
  • [42] Mining interpretable rules with MCRM: A novel rule mining algorithm with inherent feature selection and discretization
    Khosravi, Mohammadreza
    Basiri, Alireza
    INFORMATION SCIENCES, 2025, 698
  • [43] MaskedPainter: Feature selection for microarray data analysis
    Apiletti, Daniele
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    INTELLIGENT DATA ANALYSIS, 2012, 16 (04) : 717 - 737
  • [44] Lightweight Feature Selection Methods Based on Standardized Measure of Dispersion for Mining Big Data
    Fong, Simon
    Biuk-Aghai, Robert P.
    Si, Yain-Whar
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 553 - 559
  • [45] Fault diagnosis on material handling system using feature selection and data mining techniques
    Demetgul, M.
    Yildiz, K.
    Taskin, S.
    Tansel, I. N.
    Yazicioglu, O.
    MEASUREMENT, 2014, 55 : 15 - 24
  • [46] Feature selection and risk prediction for patients with coronary artery disease using data mining
    Md Idris, Nashreen
    Chiam, Yin Kia
    Varathan, Kasturi Dewi
    Wan Ahmad, Wan Azman
    Chee, Kok Han
    Liew, Yih Miin
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (12) : 3123 - 3140
  • [47] Efficient genetic algorithm based data mining using feature selection with Hausdorff distance
    Sikora R.
    Piramuthu S.
    Information Technology and Management, 2005, 6 (4) : 315 - 331
  • [48] Feature selection and risk prediction for patients with coronary artery disease using data mining
    Nashreen Md Idris
    Yin Kia Chiam
    Kasturi Dewi Varathan
    Wan Azman Wan Ahmad
    Kok Han Chee
    Yih Miin Liew
    Medical & Biological Engineering & Computing, 2020, 58 : 3123 - 3140
  • [49] DPARM: Differentially Private Association Rules Mining
    Tsou, Yao-Tung
    Zhen, Hao
    Jiang, Xiyu
    Huang, Yennun
    Kuo, Sy-Yen
    IEEE ACCESS, 2020, 8 : 142131 - 142147
  • [50] Differentially private maximal frequent sequence mining
    Cheng, Xiang
    Su, Sen
    Xu, Shengzhi
    Tang, Peng
    Li, Zhengyi
    COMPUTERS & SECURITY, 2015, 55 : 175 - 192