A Cross-modal image retrieval method based on contrastive learning

被引:0
|
作者
Zhou, Wen [1 ]
机构
[1] Wuhan Vocat Coll Software & Engn, Wuhan 430205, Peoples R China
来源
JOURNAL OF OPTICS-INDIA | 2023年 / 53卷 / 3期
关键词
Cross-modal; Contrastive learning; Picture and text retrieval;
D O I
10.1007/s12596-023-01382-9
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
With the growth of large-scale image surveillance and video data in the field of public security, image retrieval has extremely important application value. In order to solve the problem that it is difficult to distinguish different instances in the image and the lack of visible objects in the image-level retrieval, we apply a new method, which is a cross-modal instance-level retrieval method based on text retrieval of images or videos. Natural language-based image search poses a new challenge to the fine-granularity understanding of visual and linguistic patterns. To bridge language and vision, we propose a cross-modal image retrieval method, based on contrastive learning, which consists of two parts: multi-granularity cross-modal contrastive learning and multi-modal fusion module based on similarity matrix and encoder-decoder captioning. In our experiments on open-source data sets, our method outperforms several SOTA cross-modality baselines, and demonstrate the effectiveness by ablation experiments.
引用
收藏
页码:2098 / 2107
页数:10
相关论文
共 50 条
  • [21] Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval
    Xu, Mengying
    Luo, Linyin
    Lai, Hanjiang
    Yin, Jian
    DATA SCIENCE AND ENGINEERING, 2024, 9 (03) : 251 - 263
  • [22] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    World Wide Web, 2019, 22 : 825 - 841
  • [23] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159
  • [24] Intramodal consistency in triplet-based cross-modal learning for image retrieval
    Mallea, Mario
    Nanculef, Ricardo
    Araya, Mauricio
    MACHINE LEARNING, 2025, 114 (04)
  • [25] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [26] Cross-modal Retrieval Using Contrastive Learning of Visual-Semantic Embeddings
    Jain, Anurag
    Verma, Yashaswi
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4693 - 4699
  • [27] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [28] Cross-modal contrastive learning for aspect-based recommendation
    Won, Heesoo
    Oh, Byungkook
    Yang, Hyeongjun
    Lee, Kyong-Ho
    INFORMATION FUSION, 2023, 99
  • [29] A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE
    Yin, Xiuye
    Chen, Liyong
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [30] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)