A Cross-modal image retrieval method based on contrastive learning

被引:0
|
作者
Zhou, Wen [1 ]
机构
[1] Wuhan Vocat Coll Software & Engn, Wuhan 430205, Peoples R China
来源
JOURNAL OF OPTICS-INDIA | 2023年 / 53卷 / 3期
关键词
Cross-modal; Contrastive learning; Picture and text retrieval;
D O I
10.1007/s12596-023-01382-9
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
With the growth of large-scale image surveillance and video data in the field of public security, image retrieval has extremely important application value. In order to solve the problem that it is difficult to distinguish different instances in the image and the lack of visible objects in the image-level retrieval, we apply a new method, which is a cross-modal instance-level retrieval method based on text retrieval of images or videos. Natural language-based image search poses a new challenge to the fine-granularity understanding of visual and linguistic patterns. To bridge language and vision, we propose a cross-modal image retrieval method, based on contrastive learning, which consists of two parts: multi-granularity cross-modal contrastive learning and multi-modal fusion module based on similarity matrix and encoder-decoder captioning. In our experiments on open-source data sets, our method outperforms several SOTA cross-modality baselines, and demonstrate the effectiveness by ablation experiments.
引用
收藏
页码:2098 / 2107
页数:10
相关论文
共 50 条
  • [31] Cross-Modal Contrastive Learning for Code Search
    Shi, Zejian
    Xiong, Yun
    Zhang, Xiaolong
    Zhang, Yao
    Li, Shanshan
    Zhu, Yangyong
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 94 - 105
  • [32] Adversarial cross-modal retrieval based on dictionary learning
    Shang, Fei
    Zhang, Huaxiang
    Zhu, Lei
    Sun, Jiande
    NEUROCOMPUTING, 2019, 355 : 93 - 104
  • [33] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [34] Semantic supervised learning based Cross-Modal Retrieval
    Li, Zhuoyi
    Fu, Hao
    Gu, Guanghua
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 207 - 209
  • [35] Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval
    Zhu, Lilu
    Wang, Yang
    Hu, Yanfeng
    Su, Xiaolu
    Fu, Kun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [36] Cross-modal Contrastive Learning for Speech Translation
    Ye, Rong
    Wang, Mingxuan
    Li, Lei
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5099 - 5113
  • [37] Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning
    Lin, Ming-Xian
    Yang, Jie
    Wang, He
    Lai, Yu-Kun
    Jia, Rongfei
    Zhao, Binqiang
    Gao, Lin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11385 - 11395
  • [38] Cross-modal Knowledge Graph Contrastive Learning for Machine Learning Method Recommendation
    Cao, Xianshuai
    Shi, Yuliang
    Wang, Jihu
    Yu, Han
    Wang, Xinjun
    Yan, Zhongmin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3694 - 3702
  • [39] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [40] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16