Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval

被引:46
作者
Cheng, Miaomiao [1 ]
Jing, Liping [1 ]
Ng, Michael K. [2 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, 3 Shangyuancun, Beijing 100044, Peoples R China
[2] Univ Hong Kong, Dept Math, Hong Kong, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Multimedia retrieval; cross-modal hashing; unsupervised learning; partially paired data;
D O I
10.1145/3389547
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the quick development of social websites, there are more opportunities to have different media types (such as text, image, video, etc.) describing the same topic from large-scale heterogeneous data sources. To efficiently identify the inter-media correlations for multimedia retrieval, unsupervised cross-modal hashing (UCMH) has gained increased interest due to the significant reduction in computation and storage. However, most UCMH methods assume that the data from different modalities are well paired. As a result, existing UCMH methods may not achieve satisfactory performance when partially paired data are given only. In this article, we propose a new-type of UCMH method called robust unsupervised cross-modal hashing (RUCMH). The major contribution lies in jointly learning modal-specific hash function, exploring the correlations among modalities with partial or even without any pairwise correspondence, and preserving the information of original features as much as possible. The learning process can be modeled via a joint minimization problem, and the corresponding optimization algorithm is presented. A series of experiments is conducted on four real-world datasets (Wiki, MIRFlickr, NUS-WIDE, and MS-LOCO). The results demonstrate that RUCMH can significantly outperform the state-of-the-art unsupervised cross-modal hashing methods, especially for the partially paired case, which validates the effectiveness of RUCMH.
引用
收藏
页数:25
相关论文
共 49 条
[1]   Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods [J].
Ah-Pine, Julien ;
Csurka, Gabriela ;
Clinchant, Stephane .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2015, 33 (02) :9
[2]  
[Anonymous], 2013, P 21 ACM INT C MULT, DOI DOI 10.1145/2502081.2502107
[3]  
[Anonymous], 2012, P 18 ACM SIGKDD INT, DOI DOI 10.1145/2339530.2339678
[4]  
[Anonymous], 2006, Springer Series in Operations Research and Financial Engineering (ORFE)
[5]  
Bronstein MM, 2010, PROC CVPR IEEE, P3594, DOI 10.1109/CVPR.2010.5539928
[6]  
Cao Y., 2016, P ACM SIGKDD INT C K, P423
[7]   Cross-Modal Hamming Hashing [J].
Cao, Yue ;
Liu, Bin ;
Long, Mingsheng ;
Wang, Jianmin .
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :207-223
[8]  
Cao Y, 2017, AAAI CONF ARTIF INTE, P3974
[9]   Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval [J].
Deng, Cheng ;
Tang, Xu ;
Yan, Junchi ;
Liu, Wei ;
Gao, Xinbo .
IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) :208-218
[10]   Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile ;
Gao, Yue .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (11) :5427-5440