Heterogeneous Feature Alignment and Fusion in Cross-Modal Augmented Space for Composed Image Retrieval

被引：3

作者：

Pang, Huaxin ^{[1
]}

Wei, Shikui ^{[1
]}

Zhang, Gangjian ^{[1
]}

Zhang, Shiyin ^{[1
]}

Qiu, Shuang ^{[1
]}

Zhao, Yao ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

国家重点研发计划;

关键词：

Image retrieval; Semantics; Task analysis; Visualization; Transformers; Feature extraction; Fuses; Composed image retrieval; embedding fusion; multi-modal learning; image retrieval; REPRESENTATION; FRAMEWORK;

D O I：

10.1109/TMM.2022.3208742

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Composed image retrieval (CIR) aims at fusing a reference image and text feedback to search for the desired images. Compared to general image retrieval, it canmodel the users' search intent more comprehensively and search the target images more accurately, which has significant impacts in various real-world applications, such as E-commerce and Internet search. However, because of the existing heterogeneous semantic gap, the synthetic understanding and fusion of both image and text are difficult to implement. In thiswork, to tackle this difficult problem, we propose an end-to-end framework MCR, which uses text and images as retrieval queries. The framework mainly includes four pivotal modules. Specifically, we introduce the Relative Caption-aware Consistency (RCC) constraint to align text pieces and images in the database, which can effectually bridge the heterogeneous gap. The Multi-modal Complementary Fusion (MCF) and Crossmodal Guided Pooling (CGP) are constructed to mine multiple interactions between image local features and text word features and learn the complementary representation of the composed query. Furthermore, we develop a plug-and-play Weak-text Semantic Augment (WSA) module for datasets with short or incomplete query texts, which can supplement the weak-text features and is conducive to modeling an augmented semantic space. Extensive experiments demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

引用

页码：6446 / 6457

页数：12

共 50 条

[1] Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval
Zhang, Gangjian
Wei, Shikui
Pang, Huaxin
Zhao, Yao
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5353 - 5362
[2] Cross-Modal Joint Prediction and Alignment for Composed Query Image Retrieval
Yang, Yuchen
Wang, Min
Zhou, Wengang
Li, Houqiang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3303 - 3311
[3] Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval
Zhang, Feifei
Xu, Mingliang
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1000 - 1011
[4] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval
Xu, Yahui
Bin, Yi
Wei, Jiwei
Yang, Yang
Wang, Guoqing
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8346 - 8357
[5] Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment
Zhang, Gangjian
Wei, Shikui
Pang, Huaxin
Qiu, Shuang
Zhao, Yao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5976 - 5988
[6] Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval
Lv, Yafei
Xiong, Wei
Zhang, Xiaohan
Cui, Yaqi
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[7] Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval
Tang, Xu
Wang, Yijing
Ma, Jingjing
Zhang, Xiangrong
Liu, Fang
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[8] Deep Label Feature Fusion Hashing for Cross-Modal Retrieval
Ren, Dongxiao
Xu, Weihua
Wang, Zhonghua
Sun, Qinxiu
IEEE ACCESS, 2022, 10 : 100276 - 100285
[9] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
Cheng, Qimin
Zhou, Yuzhuo
Fu, Peng
Xu, Yuan
Zhang, Liang
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297
[10] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
Zhang, Shun
Li, Yupeng
Mei, Shaohui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

← 1 2 3 4 5 →