Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引：30

作者：

Cheng, Qingrong ^{[1
]}

Gu, Xiaodong ^{[1
]}

机构：

[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China

来源：

NEURAL NETWORKS | 2021年 / 134卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;

D O I：

10.1016/j.neunet.2020.11.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页码：143 / 162

页数：20

共 64 条

[1]

Andrew G., 2010, INT C MACH LEARN, P3408

[2]

[Anonymous], 2010, P NAACL HLT 2010 WOR

[3]

[Anonymous], 2011, P 28 INT C MACH LEAR

[4]

[Anonymous], 2015, PROC ADVNEURAL INF P

[5]

[Anonymous], 2012, COURSERA NEURAL NETW

[6]

[Anonymous], 2011, 22 INT JT C ART INT, DOI 10.5555/2283516.2283603

[7] On visual similarity based 3D model retrieval [J].

Chen, DY ;

Tian, XP ;

Shen, YT ;

Ming, OY .

COMPUTER GRAPHICS FORUM, 2003, 22 (03) :223-232

[8]

Chua T., 2009, ICIVR

[9] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].

Costa Pereira, Jose ;

Coviello, Emanuele ;

Doyle, Gabriel ;

Rasiwasia, Nikhil ;

Lanckriet, Gert R. G. ;

Levy, Roger ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535

[10] Image retrieval: Ideas, influences, and trends of the new age [J].

Datta, Ritendra ;

Joshi, Dhiraj ;

Li, Jia ;

Wang, James Z. .

ACM COMPUTING SURVEYS, 2008, 40 (02)

← 1 2 3 4 5 6 7 →