Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception

被引：1

作者：

Zhang, Gangjian ^{[1
,2
]}

Wei, Shikui ^{[1
,2
]}

Pang, Huaxin ^{[1
,2
]}

Qiu, Shuang ^{[3
]}

Zhao, Yao ^{[1
,2
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100044, Peoples R China

[3] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030600, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Semantics; Location awareness; Task analysis; Image retrieval; Training; Collaboration; Transformers; Composed image retrieval; multi-modal fusion and embedding; multi-modal representation learning; multi-modal retrieval; image retrieval;

D O I：

10.1109/TMM.2023.3273466

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Composed image retrieval (CIR) is an emerging and challenging research task that combines two modalities, a reference image, and a modification text, into one query to retrieve the target image. In online shopping scenarios, the user would use the modification text as feedback to describe the difference between the reference and the desired image. In order to handle the task, there must be two main problems needed to be addressed. One is the localization problem: how to precisely find those spatial areas of the image mentioned by the text. The other is the modification problem: how to effectively modify the image semantics based on the text. However, existing methods merely fuse information coarsely from the two-modality, while the accurate spatial and semantic correspondence between these two heterogeneous features tends to be neglected. Therefore, image details cannot be precisely located and modified. To this end, we consider integrating information from the two modalities more accurately from spatial and semantic aspects. Thus, we propose an end-to-end framework for the CIR task, which contains three key components, i.e., Multi-level Collaborative Localization module (MCL), Differential Semantics Discrimination module (DSD), and Image Difference Enhancement constraints (IDE). Specifically, to solve the localization problem, MCL precisely locates the text to the image areas by collaboratively using text positioning information on multiple image layers. For the modification problem, DSD builds a distribution to evaluate the modification possibility of each image semantic dimension, and IDE effectively learns the modification patterns of text against image embedding based on the distribution. Extensive experiments on three datasets show that the proposed method achieves outstanding performance against the SOTA methods.

引用

页码：916 / 928

页数：13

共 23 条

[1] Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment
Zhang, Gangjian
Wei, Shikui
Pang, Huaxin
Qiu, Shuang
Zhao, Yao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5976 - 5988
[2] Multi-level Semantic Binary Descriptor for Image Retrieval
Wu Z.-B.
Yu J.-Q.
He Y.-F.
Guan T.
Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (09): : 1641 - 1655
[3] Multi-Level Collaborative Learning for Multi-Target Domain Adaptive Semantic Segmentation
Ding, Feifei
Li, Jianjun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12730 - 12740
[4] Automatic Image Annotation by Sequentially Learning From Multi-Level Semantic Neighborhoods
Li, Houjie
Li, Wei
Zhang, Hongda
He, Xin
Zheng, Mingxiao
Song, Haiyu
IEEE ACCESS, 2021, 9 : 135742 - 135754
[5] A novel algorithm for the multi-level semantic ecommerce image searching
Ouyang Yi
Yun Ling
Zhiyong Zhang
Sixth Wuhan International Conference on E-Business, Vols 1-4: MANAGEMENT CHALLENGES IN A GLOBAL WORLD, 2007, : 639 - 646
[6] A novel algorithm for the multi-level semantic information of image searching
Yi, Ouyang
Zhang-Zhiyong
9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 758 - +
[7] A multi-scale multi-level deep descriptor with saliency for image retrieval
Wu, Zebin
Yu, Junqing
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 37939 - 37958
[8] A multi-scale multi-level deep descriptor with saliency for image retrieval
Zebin Wu
Junqing Yu
Multimedia Tools and Applications, 2023, 82 : 37939 - 37958
[9] Multi-level supervised hashing with deep features for efficient image retrieval
Ng, Wing W. Y.
Li, Jiayong
Tian, Xing
Wang, Hui
Kwong, Sam
Wallace, Jonathan
NEUROCOMPUTING, 2020, 399 : 171 - 182
[10] Weakly Supervised Image Retrieval via Coarse-scale Feature Fusion and Multi-level Attention Blocks
Nie, Xinyao
Lu, Hong
Wang, Zijian
Liu, Jingyuan
Guo, Zehua
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 48 - 52

← 1 2 3 →