Image-Text Retrieval via Contrastive Learning with Auxiliary Generative Features and Support-set Regularization

被引:4
作者
Zhang, Lei [1 ,5 ]
Yang, Min [1 ]
Li, Chengming [2 ]
Xu, Ruifeng [3 ,4 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen, Peoples R China
[3] Harbin Inst Technol, Shenzhen, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
[5] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) | 2022年
基金
中国国家自然科学基金;
关键词
Cross-modal image-text retrieval; Contrastive learning; Support-set regularization; Generative features;
D O I
10.1145/3477495.3531783
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we bridge the heterogeneity gap between different modalities and improve image-text retrieval by taking advantage of auxiliary image-to-text and text-to-image generative features with contrastive learning. Concretely, contrastive learning is devised to narrow the distance between the aligned image-text pairs and push apart the distance between the unaligned pairs from both inter- and intra-modality perspectives with the help of cross-modal retrieval features and auxiliary generative features. In addition, we devise a support-set regularization term to further improve contrastive learning by constraining the distance between each image/text and its corresponding cross-modal support-set information contained in the same semantic category. To evaluate the effectiveness of the proposed method, we conduct experiments on three benchmark datasets (i.e., MIRFLICKR-25K, NUS-WIDE, MS COCO). Experimental results show that our model significantly outperforms the strong baselines for cross-modal image-text retrieval. For reproducibility, we submit the code and data publicly at: https://github.com/Hambaobao/CRCGS.
引用
收藏
页码:1938 / 1943
页数:6
相关论文
共 21 条
  • [11] Lin ZJ, 2015, PROC CVPR IEEE, P3864, DOI 10.1109/CVPR.2015.7299011
  • [12] Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning
    Liu, Junhao
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3242 - 3253
  • [13] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [14] Patrick Mandela, 2021, ARXIV201002824
  • [15] Peng Yuxin, 2016, IJCAI, P3846
  • [16] Tian Yonglong, 2020, ARXIV191010699
  • [17] Wang Di, 2015, 24 INT JOINT C ART I
  • [18] Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
    Wang, Jian
    He, Yonghao
    Kang, Cuicui
    Xiang, Shiming
    Pan, Chunhong
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 347 - 354
  • [19] Zhang DQ, 2014, AAAI CONF ARTIF INTE, P2177
  • [20] Deep Cross-Modal Projection Learning for Image-Text Matching
    Zhang, Ying
    Lu, Huchuan
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723