VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning

被引:55
作者
Fu, Fangcheng [1 ]
Shao, Yingxia
Yu, Lele
Jiang, Jiawei
Xue, Huanran
Tao, Yangyu
Cui, Bin
机构
[1] Peking Univ, Dept Comp Sci, Beijing, Peoples R China
来源
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2021年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Vertical Federated Learning; Gradient Boosting Decision Tree; PRIVACY; APPROXIMATION; REGRESSION; TREE;
D O I
10.1145/3448016.3457241
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-evolving concerns on privacy protection, vertical federated learning (FL), where participants own non-overlapping features for the same set of instances, is becoming a heated topic since it enables multiple enterprises to strengthen the machine learning models collaboratively with privacy guarantees. Nevertheless, to achieve privacy preservation, vertical FL algorithms involve complicated training routines and time-consuming cryptography operations, leading to slow training speed. This paper explores the efficiency of the gradient boosting decision tree (GBDT) algorithm under the vertical FL setting. Specifically, we introduce VF 2 Boost, a novel and efficient vertical federated GBDT system. Significant solutions are developed to tackle the major bottlenecks. First, to handle the deficiency caused by frequent mutual-waiting in federated training, we propose a concurrent training protocol to reduce the idle periods. Second, to speed up the cryptography operations, we analyze the characteristics of the algorithm and propose customized operations. Empirical results show that our system can be 12.8-18.9x faster than the existing vertical federated implementations and support much larger datasets.
引用
收藏
页码:563 / 576
页数:14
相关论文
共 91 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]  
Abuzaid F., 2016, ADV NEURAL INFORM PR, P3817
[3]  
Alistarh D, 2017, ADV NEUR IN, V30
[4]   Scalable and Secure Logistic Regression via Homomorphic Encryption [J].
Aono, Yoshinori ;
Hayashi, Takuya ;
Le Trieu Phong ;
Wang, Lihua .
CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, :142-144
[5]  
Armknecht F, 2015, Cryptology ePrint Archive
[6]  
Breiman L., 1984, CLASSIFICATION REGRE, V37, P237, DOI [DOI 10.1201/9781315139470, 10.1201/9781315139470-8, DOI 10.1201/9781315139470-8]
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
BSI, 2020, TR021021V202001 BSI
[9]  
Burges C.J.C., 2010, Learning, V11, P81
[10]   Borg, Omega, and Kubernetes [J].
Burns, Brendan ;
Grant, Brian ;
Oppenheimer, David ;
Brewer, Eric ;
Wilkes, John .
COMMUNICATIONS OF THE ACM, 2016, 59 (05) :50-57