When CLIP meets cross-modal hashing retrieval: A new strong baseline

被引：36

作者：

Xia, Xinyu ^{[1
,2
]}

Dong, Guohua ^{[1
]}

Li, Fengling ^{[3
]}

Zhu, Lei ^{[2
]}

Ying, Xiaomin ^{[1
]}

机构：

[1] Beijing Inst Basic Med Sci, Ctr Computat Biol, Beijing 100850, Peoples R China

[2] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan 250358, Peoples R China

[3] Univ Technol Sydney, Fac Engn & Informat Technol, Australian Artificial Intelligence Inst, Ultimo, NSW 2007, Australia

来源：

INFORMATION FUSION | 2023年 / 100卷

关键词：

Cross-modal retrieval; Hashing; CLIP; Modality fusion; Contrastive learning;

D O I：

10.1016/j.inffus.2023.101968

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent days witness significant progress in various multi-modal tasks made by Contrastive Language-Image Pre-training (CLIP), a multi-modal large-scale model that learns visual representations from natural language supervision. However, the potential effects of CLIP on cross-modal hashing retrieval has not been investigated yet. In this paper, we for the first time explore the effects of CLIP on cross-modal hashing retrieval performance and propose a simple but strong baseline Unsupervised Contrastive Multi-modal Fusion Hashing network (UCMFH). We first extract the off-the-shelf visual and linguistic features from the CLIP model, as the input sources for cross-modal hashing functions. To further mitigate the semantic gap between the image and text features, we design an effective contrastive multi-modal learning module that leverages a multi modal fusion transformer encoder supervising by a contrastive loss, to enhance modality interaction while improving the semantic representation of each modality. Furthermore, we design a contrastive hash learning module to produce high-quality modal-correlated hash codes. Experiments show that significant performance improvement can be made by our simple new unsupervised baseline UCMFH compared with state-of-the-art supervised and unsupervised cross-modal hashing methods. Also, our experiments demonstrate the remarkable performance of CLIP features on cross-modal hashing retrieval task compared to deep visual and linguistic features used in existing state-of-the-art methods. The source codes for our approach is publicly available at: https://github.com/XinyuXia97/UCMFH.

引用

页数：12

共 56 条

[21] Infrared and Visible Cross-Modal Image Retrieval Through Shared Features [J].

Liu, Fangcen ;

Gao, Chenqiang ;

Sun, Yongqing ;

Zhao, Yue ;

Yang, Feng ;

Qin, Anyong ;

Meng, Deyu .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) :4485-4496

[22] Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval [J].

Liu, Song ;

Qian, Shengsheng ;

Guan, Yang ;

Zhan, Jiawei ;

Ying, Long .

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :1379-1388

[23] MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval [J].

Liu, Xin ;

Hu, Zhikai ;

Ling, Haibin ;

Cheung, Yiu-Ming .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) :964-981

[24]

Liu XW, 2021, PR MACH LEARN RES, V139

[25] Late Fusion Incomplete Multi-View Clustering [J].

Liu, Xinwang ;

Zhu, Xinzhong ;

Li, Miaomiao ;

Wang, Lei ;

Tang, Chang ;

Yin, Jianping ;

Shen, Dinggang ;

Wang, Huaimin ;

Gao, Wen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (10) :2410-2423

[26]

Liu XW, 2019, AAAI CONF ARTIF INTE, P4400

[27] Online Multi-modal Hashing with Dynamic Query-adaption [J].

Lu, Xu ;

Zhu, Lei ;

Cheng, Zhiyong ;

Nie, Liqiang ;

Zhang, Huaxiang .

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :715-724

[28]

Radford A, 2021, PR MACH LEARN RES, V139

[29]

Sauer A., 2023, arXiv

[30]

Shen HT, 2021, IEEE T KNOWL DATA EN, V33, P3351, DOI [10.1109/TKDE.2020.2970050, 10.1109/TNNLS.2020.2995708]

← 1 2 3 4 5 6 →