A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

被引：0

作者：

Gao, Ying ^{[1
,2
]}

Xie, Yuxin ^{[1
]}

Deng, Huanghao ^{[1
]}

Zhu, Zukun ^{[1
]}

Zhang, Yiyu ^{[1
]}

机构：

[1] School of Cyber Science and Technology, Beihang University, Beijing

[2] Zhongguancun Laboratory, Beijing

来源：

Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | 2024年 / 46卷 / 08期

关键词：

Commutative encryption; Data alignment; Homomorphic encryption; Privacy protection; Vertical federated learning;

D O I：

10.11999/JEIT231234

中图分类号：

学科分类号：

摘要：

In vertical federated learning, the datasets of the clients have overlapping sample IDs and features of different dimensions, thus the data alignment is necessary for model training. As the intersection of the sample IDs is public in current data alignment technologies, how to align the data without any leakage of the intersection becomes a key issue. The proposed private-preserving data ALIGNment framework (ALIGN) is based on interchangeable encryption and homomorphic encryption technologies, mainly including data encryption, ciphertext blinding, private intersecting, and feature splicing. The sample IDs are encrypted twice based on an interchangeable encryption algorithm, where the same ciphertexts correspond to the same plaintexts, and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm. The intersection of the encrypted sample IDs is obtained, and the corresponding features are then spliced and secretly shared with the participants. Compared to the existing technologies, the privacy of the ID intersection is protected, and the samples corresponding to the IDs outside intersection can be removed safely in our framework. The security proof shows that each participant cannot obtain any knowledge of each other except for the data size, which guarantees the effectiveness of the private-preserving strategies. The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85% with every 10% reduction of the redundant data. The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training. © 2024 Science Press. All rights reserved.

引用

页码：3419 / 3427

页数：8

共 31 条

[1] YANG Qiang, LIU Yang, CHEN Tianjian, Et al., Federated machine learning: Concept and applications, ACM Transactions on Intelligent Systems and Technology, 10, 2, (2019)
[2] LIU Yixuan, CHEN Hong, LIU Yuhan, Et al., Privacy-preserving techniques in federated learning[J], Journal of Software, 33, 3, pp. 1057-1092, (2022)
[3] LI Tian, SAHU A K, TALWALKAR A, Et al., Federated learning: Challenges, methods, and future directions[J], IEEE Signal Processing Magazine, 37, 3, pp. 50-60, (2020)
[4] KAIROUZ P, MCMAHAN H B, AVENT B, Et al., Advances and open problems in federated learning[J], in Machine Learning, 14, pp. 1-210, (2021)
[5] BELTRAN E T M, PEREZ M Q, SANCHEZ P M S, Et al., Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges[J], IEEE Communications Surveys & Tutorials, 25, 4, pp. 2983-3013, (2023)
[6] CHEN Jinyin, LI Rongchang, HUANG Guohan, Et al., Survey on vertical federated learning: Algorithm, privacy and security[J], Chinese Journal of Network and Information Security, 9, 2, pp. 1-20, (2023)
[7] LIU Yang, KANG Yan, ZOU Tianyuan, Et al., Vertical federated learning: Concepts, advances and challenges, (2023)
[8] ROMANINI D, HALL A J, PAPADOPOULOS P, Et al., Pyvertical: A vertical federated learning framework for multi-headed splitNN, (2021)
[9] LI Qun, THAPA C, ONG L, Et al., Vertical federated learning: Taxonomies, threats, and prospects, (2023)
[10] WEI Kang, LI Jun, MA Chuan, Et al., Vertical federated learning: Challenges, methodologies and experiments, (2022)

← 1 2 3 4 →