Label Privacy Source Coding in Vertical Federated Learning

被引:0
作者
Gao, Dashan [1 ,2 ,3 ]
Wan, Sheng [2 ,3 ]
Gu, Hanlin [4 ]
Fan, Lixin [4 ]
Yao, Xin [5 ]
Yang, Qiang [2 ]
机构
[1] Guangdong Prov Key Lab, Guangzhou, Guangdong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[3] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[4] WeBank AI Lab, Shenzhen, Peoples R China
[5] Lingnan Univ, Hong Kong, Peoples R China
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT I, ECML PKDD 2024 | 2024年 / 14941卷
基金
中国国家自然科学基金;
关键词
Vertical federated learning; Mutual information privacy; REGRESSION;
D O I
10.1007/978-3-031-70341-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study label privacy protection in vertical federated learning (VFL). VFL enables an active party who possesses labeled data to improve model performance (utility) by collaborating with passive parties who have auxiliary features. Recently, there has been a growing concern for protecting label privacy against passive parties who may surreptitiously deduce private labels from the output of their bottom models. In contrast to existing defense methods that focus on training-phase perturbation, we propose a novel offline-phase cleansing approach to protect label privacy barely compromising utility. Specifically, we first formulate a Label Privacy Source Coding (LPSC) problem to remove the redundant label information in the active party's features from labels, by assigning each sample a new weight and label (i.e., residual) for federated training. We theoretically demonstrate that LPSC 1) satisfies epsilon-mutual information privacy (epsilon-MIP) and 2) can be reduced to gradient boosting's objective thereby efficiently optimized. Therefore, we propose a gradient boosting-based LPSC method to protect label privacy. Moreover, given that LPSC only provides bounded privacy enhancement, we further introduce the two-phase LPSC+ framework, which enables a flexible privacy-utility trade-off by incorporating training-phase perturbation methods, such as adversarial training. Experimental results on four realworld datasets substantiate the efficacy of LPSC and the superiority of our LPSC+ framework.
引用
收藏
页码:313 / 331
页数:19
相关论文
共 26 条
[1]  
Alemi AA, 2019, Arxiv, DOI [arXiv:1612.00410, DOI 10.48550/ARXIV.1612.00410]
[2]  
Belghazi MI, 2018, PR MACH LEARN RES, V80
[3]   Boosting with the L2 loss:: Regression and classification [J].
Bühlmann, P ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :324-339
[4]   SecureBoost: A Lossless Federated Learning Framework [J].
Cheng, Kewei ;
Fan, Tao ;
Jin, Yilun ;
Liu, Yang ;
Chen, Tianjian ;
Papadopoulos, Dimitrios ;
Yang, Qiang .
IEEE INTELLIGENT SYSTEMS, 2021, 36 (06) :87-98
[5]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284
[6]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[7]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[8]  
Fu C, 2022, PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, P1397
[9]   VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning [J].
Fu, Fangcheng ;
Shao, Yingxia ;
Yu, Lele ;
Jiang, Jiawei ;
Xue, Huanran ;
Tao, Yangyu ;
Cui, Bin .
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, :563-576
[10]  
Gu HL, 2023, PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, P3759