Privacy Preserving Vertical Federated Learning for Tree-based Models

被引:91
作者
Wu, Yuncheng [1 ]
Cai, Shaofeng [1 ]
Xiao, Xiaokui [1 ]
Chen, Gang [2 ]
Ooi, Beng Chin [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 11期
关键词
Learning systems - Privacy-preserving techniques;
D O I
10.14778/3407790.3407811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies vertical federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise m - 1 out of m clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.
引用
收藏
页码:2090 / 2103
页数:14
相关论文
共 72 条
[1]  
[Anonymous], 1991, CRYPTO
[2]   Generalizing the SPDZ Compiler For Other Protocols [J].
Araki, Toshinori ;
Barak, Assi ;
Furukawa, Jun ;
Keller, Marcel ;
Lindell, Yehuda ;
Ohara, Kazuma ;
Tsuchida, Hikaru .
PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, :880-895
[3]   Decision tree classifiers for automated medical diagnosis [J].
Azar, Ahmad Taher ;
El-Metwally, Shereen M. .
NEURAL COMPUTING & APPLICATIONS, 2013, 23 (7-8) :2387-2403
[4]  
Bhagoji AN, 2019, PR MACH LEARN RES, V97
[5]   Practical Secure Aggregation for Privacy-Preserving Machine Learning [J].
Bonawitz, Keith ;
Ivanov, Vladimir ;
Kreuter, Ben ;
Marcedone, Antonio ;
McMahan, H. Brendan ;
Patel, Sarvar ;
Ramage, Daniel ;
Segal, Aaron ;
Seth, Karn .
CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, :1175-1191
[6]   Machine Learning Classification over Encrypted Data [J].
Bost, Raphael ;
Popa, Raluca Ada ;
Tu, Stephen ;
Goldwasser, Shafi .
22ND ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2015), 2015,
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.1201/9781315139470
[9]   Data driven prediction models of energy use of appliances in a low-energy house [J].
Candanedo, Luis M. ;
Feldheim, Veronique ;
Deramaix, Dominique .
ENERGY AND BUILDINGS, 2017, 140 :81-97
[10]   Universally composable security: A new paradigm for cryptographic protocols [J].
Canetti, R .
42ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2001, :136-145