Multi-label guided graph similarity learning for cross-modal retrieval

被引:0
作者
Zhu, Jie [1 ]
Wang, Dan [1 ]
Shi, Guangtian [1 ]
Wu, Shufang [2 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Baoding 071002, Peoples R China
[2] Hebei Univ, Coll Management, Baoding 071002, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Graph convolutional networks; Graph similarity learning; Fusion-alignment strategy; Multi-label; NETWORK;
D O I
10.1016/j.inffus.2025.103142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the continuous growth of multimodal data on the Internet, cross-modal retrieval tasks have garnered increasing attention. Currently, most proposed methods aim to map multimodal data into a common representation space, where samples with similar semantic content are close. However, these methods do not fully explore the similarities between multi-labels and the similarities between multi-labels and samples. Furthermore, maintaining semantic consistency between the common representations of different modalities remains an essential problem to be addressed. To tackle these issues, this paper proposes a Multi-label Guided Graph Similarity Learning (MGGSL) method. This method constructs a Multi-label (ML) graph by leveraging the similarities among multi-labels within the dataset and extracts multi-label embeddings through a graph convolutional network (GCN) to guide the learning of the common representations for different modalities. Additionally, we utilize the similarity between multi-labels and samples to construct a Visual Semantic (VS) graph and a Textual Semantic (TS) graph, and propose a graph similarity learning approach, which ensures the semantic consistency of cross-modal features from the perspectives of node similarity, adjacency matrix similarity, edge similarity, and degree similarity. Experiments were conducted on three widely used datasets: NUS-WIDE, MIRFlickr-25K, and MS-COCO. The results demonstrate that our MGGSL outperforms several existing state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 52 条
[1]  
Akaho S., 2001, P INT M PSYCH SOC, P263
[2]  
[Anonymous], 2024, P ACM INT C MULT
[3]  
[Anonymous], 2009, P ACM INT C IM VID R
[4]  
[Anonymous], 2008, P 1 ACM INT C MULTIM
[5]   Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval [J].
Bai, Cong ;
Zeng, Chao ;
Ma, Qing ;
Zhang, Jinglin .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) :4756-4767
[6]   Proxy-Based Graph Convolutional Hashing for Cross-Modal Retrieval [J].
Bai, Yibing ;
Shu, Zhenqiu ;
Yu, Jun ;
Yu, Zhengtao ;
Wu, Xiao-Jun .
IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (04) :371-385
[7]  
Busbridge D, 2019, Arxiv, DOI [arXiv:1904.05811, 10.48550/arxiv.1904.05811, DOI 10.48550/ARXIV.1904.05811]
[8]   Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [J].
Deng, Cheng ;
Chen, Zhaojia ;
Liu, Xianglong ;
Gao, Xinbo ;
Tao, Dacheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3893-3903
[9]   Adversarial Graph Convolutional Network for Cross-Modal Retrieval [J].
Dong, Xinfeng ;
Liu, Li ;
Zhu, Lei ;
Nie, Liqiang ;
Zhang, Huaxiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) :1634-1645
[10]   MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval [J].
Duan, Youxiang ;
Chen, Ning ;
Zhang, Peiying ;
Kumar, Neeraj ;
Chang, Lunjie ;
Wen, Wu .
PATTERN RECOGNITION, 2022, 128