A Multi-Modal Vertical Federated Learning Framework Based on Homomorphic Encryption

被引：9

作者：

Gong, Maoguo ^{[1
]}

Zhang, Yuanqiao ^{[1
]}

Gao, Yuan ^{[1
]}

Qin, A. K. ^{[2
]}

Wu, Yue ^{[1
]}

Wang, Shanfeng ^{[1
]}

Zhang, Yihong ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Shaanxi, Peoples R China

[2] Swinburne Univ Technol, Dept Comp Technol, Melbourne, Vic 3122, Australia

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2024年 / 19卷

基金：

中国国家自然科学基金;

关键词：

Vertical federated learning; universal frame-work; homomorphic encryption; bivariate Taylor series expansion; multi-modal learning; cross-domain semantic feature extraction;

D O I：

10.1109/TIFS.2023.3340994

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Federated learning has gained prominence as an effective solution for addressing data silos, enabling collaboration among multiple parties without sharing their data. However, existing federated learning algorithms often neglect the challenge posed by multi-modal data distribution. Moreover, previous pioneering work face limitations in encrypting the exponential and logarithmic operations of the objective function with multiple independent variables, and they rely on a third-party cooperator for encryption. To address these limitations, this paper introduces a universal multi-modal vertical federated learning framework. To tackle the data distribution challenge, we propose a two-step multi-modal transformer model that captures cross-domain semantic features effectively. For encryption, where traditional additively homomorphic encryption algorithms fall short by supporting only addition and multiplication, we employ bivariate Taylor series expansion to transform the objective function. Integrating these components, we present a comprehensive training and transmission protocol that eliminates the need for a third-party cooperator during the encryption process. Extensive experiments conducted on diverse video-text and image-text datasets validate the superior performance of our framework compared to state-of-the-art approaches, affirming its effectiveness in multi-modal vertical federated learning settings.

引用

页码：1826 / 1839

页数：14

共 46 条

[1] Multimodal Machine Learning: A Survey and Taxonomy
Baltrusaitis, Tadas
Ahuja, Chaitanya
Morency, Louis-Philippe
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) : 423 - 443
[2] Barni M., 2006, P 8 WORKSHOP MULTIME, P146, DOI 10.1145/1161366.1161393
[3] Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Bugliarello, Emanuele
Cotterell, Ryan
Okazaki, Naoaki
Elliott, Desmond
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 978 - 994
[4] IEMOCAP: interactive emotional dyadic motion capture database
Busso, Carlos
Bulut, Murtaza
Lee, Chi-Chun
Kazemzadeh, Abe
Mower, Emily
Kim, Samuel
Chang, Jeannette N.
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
[5] Chen TY, 2020, Arxiv, DOI arXiv:2007.06081
[6] Chen T, 2020, PR MACH LEARN RES, V119
[7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8] Dowlin N, 2016, PR MACH LEARN RES, V48
[9] Du WL, 2004, SIAM PROC S, P222
[10] Elliptic curve Paillier schemes
Galbraith, SD
[J]. JOURNAL OF CRYPTOLOGY, 2002, 15 (02) : 129 - 138

← 1 2 3 4 5 →