Unsupervised Deep Fusion Cross-modal Hashing

被引：4

作者：

Huang, Jiaming ^{[1
]}

Min, Chen ^{[1
]}

Jing, Liping ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Beijing, Peoples R China

来源：

ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2019年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

multimodal; information retrieval; hashing; deep learning;

D O I：

10.1145/3340555.3353752

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

To handle the large-scale data in terms of storage and searching time, learning to hash becomes popular due to its efficiency and effectiveness in approximate cross-modal nearest neighbors searching. Most existing unsupervised cross modal hashing methods, to shorten the semantic gap, try to simultaneously minimize the loss of intra-modal similarity and the loss of inter-modal similarity. However, these models can not guarantee in theory these two losses are simultaneously minimized. In this paper, we first theoretically proved that cross-modal hashing could be implemented by protecting both intra-modal and inter-modal similarity with the aid of variational inference technique and point out the problem that maximizing intra and inter-modal similarity is mutually constrained. In this case, we propose an unsupervised cross-modal hashing framework named as Unsupervised Deep Fusion Cross-modal Hashing (UDFCH) which leverages the data fusion to capture the underlying manifold across modalities to avoid above problem. What's more, in order to reduce the quantization loss, we sample hash codes from different Bernoulli distributions through a reparameterization trick. Our UDFCH framework has two stages. The first stage aims at mining the the intra-modal structure of each modality. The second stage aims to determine the modality-aware hash code by sufficiently considering the correlation and manifold structure among modalities. A series of experiments conducted on three benchmark datasets show that the proposed UDFCH framework outperforms the state-of-the-art methods on different cross-modal retrieval tasks.

引用

页码：358 / 366

页数：9

共 26 条

[1]

[Anonymous], P 3 INT C LEARNING R

[2]

[Anonymous], 2015, 24 INT JOINT C ART I

[3]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[4]

[Anonymous], 2018, P EUR C COMP VIS EC

[5]

Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928

[6] Correlation Autoencoder Hashing for Supervised Cross-Modal Search [J].

Cao, Yue ;

Long, Mingsheng ;

Wang, Jianmin ;

Zhu, Han .

ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, :197-204

[7]

Chua T.-S., 2009, P ACM INT C IM VID R, P1

[8] Collective Matrix Factorization Hashing for Multimodal Data [J].

Ding, Guiguang ;

Guo, Yuchen ;

Zhou, Jile .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090

[9] Cross-modal Retrieval with Correspondence Autoencoder [J].

Feng, Fangxiang ;

Wang, Xiaojie ;

Li, Ruifan .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :7-16

[10]

Hu D., 2018, IEEE T MULTIMEDIA, P6

← 1 2 3 →