Iterative graph attention memory network for cross-modal retrieval

被引:19
作者
Dong, Xinfeng [1 ]
Zhang, Huaxiang [1 ,3 ]
Dong, Xiao [2 ]
Lu, Xu [1 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250014, Shandong, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
[3] Shandong Jiaotong Univ, Sch Informat Sci & Elect Engn, Jinan 250358, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Graph convolutional network; Graph attention mechanism; REPRESENTATION; MULTIVIEW;
D O I
10.1016/j.knosys.2021.107138
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to eliminate the semantic gap between multi-modal data and effectively fuse multi-modal data is the key problem of cross-modal retrieval. The abstractness of semantics makes semantic representation one-sided. In order to obtain complementary semantic information for samples with the same semantics, we construct a local graph for each instance and utilize a graph feature extractor (GFE) to reconstruct the sample representation based on the adjacency relationship between the sample itself and its neighbors. Owing to the problem that some cross-modal methods only focus on the learning of paired samples and cannot integrate more cross-modal information from the other modalities, we propose a cross-modal graph attention strategy to generate the graph attention representation for each sample from the local graph of its corresponding paired sample. In order to eliminate heterogeneous gap between modalities, we fuse the features of the two modalities using a recurrent gated memory network to choose prominent features from other modalities and filter out unimportant information to obtain a more discriminative feature representation in the common latent space. Experiments on four benchmark datasets demonstrate the superiority of our proposed model compared with state-of-the-art cross-modal methods. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 52 条
[1]  
Akaho S., 2001, P INT M PSYCH SOC, P263
[2]  
Bruna J, 2013, COMPUT SCI
[3]   TransGCN:Coupling Transformation Assumptions with Graph Convolutional Networks for Link Prediction [J].
Cai, Ling ;
Yan, Bo ;
Mai, Gengchen ;
Janowicz, Krzysztof ;
Zhu, Rui .
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, :131-138
[4]   Explore-Exploit Graph Traversal for Image Retrieval [J].
Chang, Cheng ;
Yu, Guangwei ;
Liu, Chundi ;
Volkovs, Maksims .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9415-9423
[5]   IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J].
Chen, Hui ;
Ding, Guiguang ;
Liu, Xudong ;
Lin, Zijia ;
Liu, Ji ;
Han, Jungong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12652-12660
[6]   SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval [J].
Chen, Zhen-Duo ;
Li, Chuan-Xiang ;
Luo, Xin ;
Nie, Liqiang ;
Zhang, Wei ;
Xu, Xin-Shun .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) :2262-2275
[7]  
Chua T., 2009, ICIVR
[8]   Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval [J].
Deng, Cheng ;
Tang, Xu ;
Yan, Junchi ;
Liu, Wei ;
Gao, Xinbo .
IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) :208-218
[9]   Collective Matrix Factorization Hashing for Multimodal Data [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090
[10]  
Hu Fenyu, 2019, ARXIV190206667