Preserving Model Privacy for Machine Learning in Distributed Systems

被引:38
作者
Jia, Qi [1 ]
Guo, Linke [1 ]
Jin, Zhanpeng [1 ]
Fang, Yuguang [2 ]
机构
[1] SUNY Binghamton, Dept Elect & Comp Engn, Binghamton, NY 13902 USA
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Machine learning; privacy preservation; data classification; model evaluation; AUTHENTICATION SYSTEM; NETWORKS;
D O I
10.1109/TPDS.2018.2809624
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine Learning based data classification is a widely used data mining technique. By learning massive data collected from the real world, data classification helps learners discover hidden data patterns. These hidden data patterns are represented by the learned model in different machine learning schemes. Based on such models, a user can classify whether the new incoming data belongs to an existing class; or, multiple entities may test the similarity of their datasets. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets for classifying or testing. On the one hand, the learned model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose an approach to preserve the model privacy of the data classification and similarity evaluation for distributed systems. With our scheme, neither new data nor learned models are directly revealed during the classification and similarity evaluation procedures. Based on extensive real-world experiments, we have evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.
引用
收藏
页码:1808 / 1822
页数:15
相关论文
共 44 条
  • [1] Aggarwal Charu C, 2008, A general survey of privacy-preserving data mining models and algorithms
  • [2] Alpaydin E, 2014, ADAPT COMPUT MACH LE, P115
  • [3] [Anonymous], 2017, DATA LEAKAGE HEALTHC
  • [4] [Anonymous], 2007, Tech. rep
  • [5] Machine Learning Classification over Encrypted Data
    Bost, Raphael
    Popa, Raluca Ada
    Tu, Stephen
    Goldwasser, Shafi
    [J]. 22ND ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2015), 2015,
  • [6] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [7] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [8] Chu CK, 2005, LECT NOTES COMPUT SC, V3386, P172
  • [9] Forero PA, 2010, J MACH LEARN RES, V11, P1663
  • [10] Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
    Fredrikson, Matt
    Jha, Somesh
    Ristenpart, Thomas
    [J]. CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, : 1322 - 1333