Differentially Private Random Forest with High Utility

被引:40
作者
Rana, Santu [1 ]
Gupta, Sunil Kumar [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2015年
关键词
differential privacy; decision trees; random forest; privacy preserving data mining;
D O I
10.1109/ICDM.2015.76
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Privacy-preserving data mining has become an active focus of the research community in the domains where data are sensitive and personal in nature. We propose a novel random forest algorithm under the framework of differential privacy. Unlike previous works that strictly follow differential privacy and keep the complete data distribution approximately invariant to change in one data instance, we only keep the necessary statistics (e.g. variance of the estimate) invariant. This relaxation results in significantly higher utility. To realize our approach, we propose a novel differentially private decision tree induction algorithm and use them to create an ensemble of decision trees. We also propose feasible adversary models to infer about the attribute and class label of unknown data in presence of the knowledge of all other data. Under these adversary models, we derive bounds on the maximum number of trees that are allowed in the ensemble while maintaining privacy. We focus on binary classification problem and demonstrate our approach on four real-world datasets. Compared to the existing privacy preserving approaches we achieve significantly higher utility.
引用
收藏
页码:955 / 960
页数:6
相关论文
共 26 条
  • [1] Agrawal R, 2000, SIGMOD REC, V29, P439, DOI 10.1145/335191.335438
  • [2] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [3] Bojarski M, 2014, ARXIV14106973
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Chaudhuri K., 2008, Advances in neural information processing systems, P289
  • [6] Chaudhuri K, 2011, J MACH LEARN RES, V12, P1069
  • [7] AUDITING AND INFERENCE CONTROL IN STATISTICAL DATABASES
    CHIN, FY
    OZSOYOGLU, G
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1982, 8 (06) : 574 - 582
  • [8] Dowd J, 2006, LECT NOTES COMPUT SC, V3995, P145
  • [9] Dwork C., 2011, ENCY CRYPTOGRAPHY SE, P338, DOI 10.1007/978-1-4419-5906-5_752
  • [10] Calibrating noise to sensitivity in private data analysis
    Dwork, Cynthia
    McSherry, Frank
    Nissim, Kobbi
    Smith, Adam
    [J]. THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 : 265 - 284