Heterogeneous Feature Alignment and Fusion in Cross-Modal Augmented Space for Composed Image Retrieval

被引:3
|
作者
Pang, Huaxin [1 ]
Wei, Shikui [1 ]
Zhang, Gangjian [1 ]
Zhang, Shiyin [1 ]
Qiu, Shuang [1 ]
Zhao, Yao [1 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
基金
国家重点研发计划;
关键词
Image retrieval; Semantics; Task analysis; Visualization; Transformers; Feature extraction; Fuses; Composed image retrieval; embedding fusion; multi-modal learning; image retrieval; REPRESENTATION; FRAMEWORK;
D O I
10.1109/TMM.2022.3208742
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Composed image retrieval (CIR) aims at fusing a reference image and text feedback to search for the desired images. Compared to general image retrieval, it canmodel the users' search intent more comprehensively and search the target images more accurately, which has significant impacts in various real-world applications, such as E-commerce and Internet search. However, because of the existing heterogeneous semantic gap, the synthetic understanding and fusion of both image and text are difficult to implement. In thiswork, to tackle this difficult problem, we propose an end-to-end framework MCR, which uses text and images as retrieval queries. The framework mainly includes four pivotal modules. Specifically, we introduce the Relative Caption-aware Consistency (RCC) constraint to align text pieces and images in the database, which can effectually bridge the heterogeneous gap. The Multi-modal Complementary Fusion (MCF) and Crossmodal Guided Pooling (CGP) are constructed to mine multiple interactions between image local features and text word features and learn the complementary representation of the composed query. Furthermore, we develop a plug-and-play Weak-text Semantic Augment (WSA) module for datasets with short or incomplete query texts, which can supplement the weak-text features and is conducive to modeling an augmented semantic space. Extensive experiments demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.
引用
收藏
页码:6446 / 6457
页数:12
相关论文
共 50 条
  • [41] Deep Multiscale Fusion Hashing for Cross-Modal Retrieval
    Nie, Xiushan
    Wang, Bowei
    Li, Jiajia
    Hao, Fanchang
    Jian, Muwei
    Yin, Yilong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 401 - 410
  • [42] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [43] Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal Retrieval
    Yang, Zhan
    Deng, Xiyin
    Guo, Lin
    Long, Jun
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 851 - 864
  • [44] Consistency Center-Based Deep Cross-Modal Hashing for Multisource Remote Sensing Image Retrieval
    Sun, Yuxi
    Ye, Yunming
    Kang, Jian
    Fernandez-Beltran, Ruben
    Li, Xutao
    Xiong, Zhenyu
    Huang, Xu
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [45] Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval
    Xu, Liming
    Zeng, Xianhua
    Zheng, Bochuan
    Li, Weisheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3371 - 3385
  • [46] Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning
    Liu, Junhao
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3242 - 3253
  • [47] Cross-Modal Retriever: Unsupervised Image Retrieval with Text and Reference Images
    Desai, Padmashree
    Kumar, Vivek
    Srivastava, Chandan
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
  • [48] Geometric Matching for Cross-Modal Retrieval
    Wang, Zheng
    Gao, Zhenwei
    Yang, Yang
    Wang, Guoqing
    Jiao, Chengbo
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 13
  • [49] Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
    Liao, Lei
    Yang, Meng
    Zhang, Bob
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 920 - 934
  • [50] Cross-Modal Dynamic Networks for Video Moment Retrieval With Text Query
    Wang, Gongmian
    Xu, Xing
    Shen, Fumin
    Lu, Huimin
    Ji, Yanli
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1221 - 1232