Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:30
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 64 条
[1]  
Andrew G., 2010, INT C MACH LEARN, P3408
[2]  
[Anonymous], 2010, P NAACL HLT 2010 WOR
[3]  
[Anonymous], 2011, P 28 INT C MACH LEAR
[4]  
[Anonymous], 2015, PROC ADVNEURAL INF P
[5]  
[Anonymous], 2012, COURSERA NEURAL NETW
[6]  
[Anonymous], 2011, 22 INT JT C ART INT, DOI 10.5555/2283516.2283603
[7]   On visual similarity based 3D model retrieval [J].
Chen, DY ;
Tian, XP ;
Shen, YT ;
Ming, OY .
COMPUTER GRAPHICS FORUM, 2003, 22 (03) :223-232
[8]  
Chua T., 2009, ICIVR
[9]   On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].
Costa Pereira, Jose ;
Coviello, Emanuele ;
Doyle, Gabriel ;
Rasiwasia, Nikhil ;
Lanckriet, Gert R. G. ;
Levy, Roger ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535
[10]   Image retrieval: Ideas, influences, and trends of the new age [J].
Datta, Ritendra ;
Joshi, Dhiraj ;
Li, Jia ;
Wang, James Z. .
ACM COMPUTING SURVEYS, 2008, 40 (02)