Unsupervised Contrastive Cross-Modal Hashing

被引:130
作者
Hu, Peng [1 ]
Zhu, Hongyuan [2 ]
Lin, Jie [2 ]
Peng, Dezhong [1 ,3 ,4 ]
Zhao, Yin-Ping [5 ]
Peng, Xi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore
[3] Chengdu Ruibei Yingte Informat Technol Ltd Co, Chengdu 610094, Peoples R China
[4] Sichuan Zhiqian Technol Ltd Co, Chengdu 610094, Peoples R China
[5] Northwestern Polytech Univ, Sch Software, Xian 710072, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Semantics; Bridges; Optimization; Correlation; Task analysis; Degradation; Binary codes; Common hamming space; contrastive hashing network; cross-modal retrieval; unsupervised cross-modal hashing; NETWORK;
D O I
10.1109/TPAMI.2022.3177356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
引用
收藏
页码:3877 / 3889
页数:13
相关论文
共 64 条
[1]  
[Anonymous], 2013, P ACM SIGMOD INT C M, DOI DOI 10.1145/2463676.2465274
[2]  
[Anonymous], INT C LEARNING REPRE
[3]  
Chatfield K, 2014, Arxiv, DOI arXiv:1405.3531
[4]  
Chen XL, 2020, Arxiv, DOI arXiv:2003.04297
[5]   Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval [J].
Cheng, Shuli ;
Wang, Liejun ;
Du, Anyu .
ENTROPY, 2020, 22 (11) :1-22
[6]   Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [J].
Deng, Cheng ;
Chen, Zhaojia ;
Liu, Xianglong ;
Gao, Xinbo ;
Tao, Dacheng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3893-3903
[7]   Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile ;
Gao, Yue .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (11) :5427-5440
[8]   Collective Matrix Factorization Hashing for Multimodal Data [J].
Ding, Guiguang ;
Guo, Yuchen ;
Zhou, Jile .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2083-2090
[9]   Cross-Modal Hashing via Rank-Order Preserving [J].
Ding, Kun ;
Fan, Bin ;
Huo, Chunlei ;
Xiang, Shiming ;
Pan, Chunhong .
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (03) :571-585
[10]  
Faghri F., 2018, PROC BRIT MACH VIS C