Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval

被引：81

作者：

Zhang, Jian ^{[1
]}

Peng, Yuxin ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Correlation; Manifolds; Dogs; Data models; Semantics; Generative adversarial networks; Multimedia databases; Cross-modal hashing; generative adversarial networks; manifold structure; NETWORK;

D O I：

10.1109/TMM.2019.2922128

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal hashing aims to map heterogeneous cross-modal data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Unsupervised cross-modal hashing is more flexible and applicable than supervised methods, since no intensive labeling work is involved. However, existing unsupervised methods learn the hashing functions by preserving inter- and intra-correlations while ignoring the underlying manifold structure across different modalities, which is extremely helpful in capturing the meaningful nearest neighbors of different modalities for cross-modal retrieval. Furthermore, existing works mainly focus on pairwise relation modeling while ignoring the correlations within multiple modalities. To address the above-mentioned problems, in this paper, we propose a multi-pathway generative adversarial hashing approach for unsupervised cross-modal retrieval, which makes full use of a generative adversarial network's ability for unsupervised representation learning to exploit the underlying manifold structure of cross-modal data. The main contributions can be summarized as follows: First, we propose a multi-pathway generative adversarial network to model cross-modal hashing in an unsupervised fashion. In the proposed network, given the data of one modality, the generative model tries to fit the distribution over the manifold structure and selects informative data of other modalities to challenge the discriminative model. The discriminative model learns to distinguish the generated data and the true positive data sampled from the correlation graph to achieve better retrieval accuracy. These two models are trained in an adversarial way to improve each other and promote hashing function learning. Second, we propose a correlation graph-based approach to capture the underlying manifold structure across different modalities so that data of different modalities but within the same manifold can have a smaller Hamming distance to promote retrieval accuracy. Extensive experiments compared with state-of-the-art methods on three widely used datasets verify the effectiveness of our proposed approach.

引用

页码：174 / 187

页数：14

共 61 条

[1]

[Anonymous], representation learning with deep convolutional generative

[2]

[Anonymous], 2017, ARXIV2017170104722

[3]

[Anonymous], 1999, MODERN INFORM RETRIE

[4]

[Anonymous], PERSP RETHINK REFORM

[5]

[Anonymous], 2009, P ACM INT C IM VID R, DOI [10.1145/1646396.1646452, DOI 10.1145/1646396.1646452]

[6]

Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928

[7] Deep Visual-Semantic Hashing for Cross-Modal Retrieval [J].

Cao, Yue ;

Long, Mingsheng ;

Wang, Jianmin ;

Yang, Qiang ;

Yu, Philip S. .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1445-1454

[8] Correlation Autoencoder Hashing for Supervised Cross-Modal Search [J].

Cao, Yue ;

Long, Mingsheng ;

Wang, Jianmin ;

Zhu, Han .

ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, :197-204

[9] On visual similarity based 3D model retrieval [J].

Chen, DY ;

Tian, XP ;

Shen, YT ;

Ming, OY .

COMPUTER GRAPHICS FORUM, 2003, 22 (03) :223-232

[10] Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [J].

Deng, Cheng ;

Chen, Zhaojia ;

Liu, Xianglong ;

Gao, Xinbo ;

Tao, Dacheng .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3893-3903

← 1 2 3 4 5 6 7 →