Federated learning for supervised cross-modal retrieval

被引:3
作者
Li, Ang [1 ]
Li, Yawen [2 ]
Shao, Yingxia [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Econ & Management, Beijing 100876, Peoples R China
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2024年 / 27卷 / 04期
基金
中国国家自然科学基金;
关键词
Federated learning; Cross-modal retrieval; Supervised learning; Multi-modal learning;
D O I
10.1007/s11280-024-01249-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last decade, the explosive surge in multi-modal data has propelled cross-modal retrieval into the forefront of information retrieval research. Exceptional cross-modal retrieval algorithms are crucial for meeting user requirements effectively and offering invaluable support for subsequent tasks, including cross-modal recommendations, multi-modal content generation, and so forth. Previous methods for cross-modal retrieval typically search for a single common subspace, neglecting the possibility of multiple common subspaces that may mutually reinforce each other in reality, thereby resulting in the poor performance of cross-modal retrieval. To address this issue, we propose a Federated Supervised Cross-Modal Retrieval approach (FedSCMR), which leverages competition to learn the optimal common subspace, and adaptively aggregates the common subspaces of multiple clients for dynamic global aggregation. To reduce the differences between modalities, FedSCMR minimizes the semantic discrimination and consistency in the common subspace, in addition to modeling semantic discrimination in the label space. Additionally, it minimizes modal discrimination and semantic invariance across common subspaces to strengthen cross-subspace constraints and promote learning of the optimal common subspace. In the aggregation stage for federated learning, we design an adaptive model aggregation scheme that can dynamically and collaboratively evaluate the model contribution based on data volume, data category, model loss, and mean average precision, to adaptively aggregate multi-party common subspaces. Experimental results on two publicly available datasets demonstrate that our proposed FedSCMR surpasses state-of-the-art cross-modal retrieval methods.
引用
收藏
页数:24
相关论文
共 41 条
[1]  
Andrew G., 2013, PROC INT C MACH LEAR
[2]  
[Anonymous], 2010, PROC NAACL HLT WORKS
[3]   Reliable and Efficient Multimedia Service Optimization for Edge Computing-Based 5G Networks: Game Theoretic Approaches [J].
Cao, Tengfei ;
Xu, Changqiao ;
Du, Junping ;
Li, Yawen ;
Xiao, Han ;
Gong, Changhui ;
Zhong, Lujie ;
Niyato, Dusit .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (03) :1610-1625
[4]   Cross-modal Knowledge Graph Contrastive Learning for Machine Learning Method Recommendation [J].
Cao, Xianshuai ;
Shi, Yuliang ;
Wang, Jihu ;
Yu, Han ;
Wang, Xinjun ;
Yan, Zhongmin .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3694-3702
[5]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[6]  
Guan Zeli, 2023, IEEE Trans. Knowl. Data Eng.
[7]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[8]   Cross-Modal Retrieval via Deep and Bidirectional Representation Learning [J].
He, Yonghao ;
Xiang, Shiming ;
Kang, Cuicui ;
Wang, Jian ;
Pan, Chunhong .
IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (07) :1363-1377
[9]   Cross-Modal Retrieval With Partially Mismatched Pairs [J].
Hu, Peng ;
Huang, Zhenyu ;
Peng, Dezhong ;
Wang, Xu ;
Peng, Xi .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) :9595-9610
[10]   HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps [J].
Huang, Jizhou ;
Wang, Haifeng ;
Sun, Yibo ;
Fan, Miao ;
Huang, Zhengjie ;
Yuan, Chunyuan ;
Li, Yawen .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :3032-3040