Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception

被引:1
作者
Zhang, Gangjian [1 ,2 ]
Wei, Shikui [1 ,2 ]
Pang, Huaxin [1 ,2 ]
Qiu, Shuang [3 ]
Zhao, Yao [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100044, Peoples R China
[3] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030600, Peoples R China
关键词
Semantics; Location awareness; Task analysis; Image retrieval; Training; Collaboration; Transformers; Composed image retrieval; multi-modal fusion and embedding; multi-modal representation learning; multi-modal retrieval; image retrieval;
D O I
10.1109/TMM.2023.3273466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Composed image retrieval (CIR) is an emerging and challenging research task that combines two modalities, a reference image, and a modification text, into one query to retrieve the target image. In online shopping scenarios, the user would use the modification text as feedback to describe the difference between the reference and the desired image. In order to handle the task, there must be two main problems needed to be addressed. One is the localization problem: how to precisely find those spatial areas of the image mentioned by the text. The other is the modification problem: how to effectively modify the image semantics based on the text. However, existing methods merely fuse information coarsely from the two-modality, while the accurate spatial and semantic correspondence between these two heterogeneous features tends to be neglected. Therefore, image details cannot be precisely located and modified. To this end, we consider integrating information from the two modalities more accurately from spatial and semantic aspects. Thus, we propose an end-to-end framework for the CIR task, which contains three key components, i.e., Multi-level Collaborative Localization module (MCL), Differential Semantics Discrimination module (DSD), and Image Difference Enhancement constraints (IDE). Specifically, to solve the localization problem, MCL precisely locates the text to the image areas by collaboratively using text positioning information on multiple image layers. For the modification problem, DSD builds a distribution to evaluate the modification possibility of each image semantic dimension, and IDE effectively learns the modification patterns of text against image embedding based on the distribution. Extensive experiments on three datasets show that the proposed method achieves outstanding performance against the SOTA methods.
引用
收藏
页码:916 / 928
页数:13
相关论文
共 23 条
  • [1] Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Qiu, Shuang
    Zhao, Yao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5976 - 5988
  • [2] Multi-level Semantic Binary Descriptor for Image Retrieval
    Wu Z.-B.
    Yu J.-Q.
    He Y.-F.
    Guan T.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (09): : 1641 - 1655
  • [3] Multi-Level Collaborative Learning for Multi-Target Domain Adaptive Semantic Segmentation
    Ding, Feifei
    Li, Jianjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12730 - 12740
  • [4] Automatic Image Annotation by Sequentially Learning From Multi-Level Semantic Neighborhoods
    Li, Houjie
    Li, Wei
    Zhang, Hongda
    He, Xin
    Zheng, Mingxiao
    Song, Haiyu
    IEEE ACCESS, 2021, 9 : 135742 - 135754
  • [5] A novel algorithm for the multi-level semantic ecommerce image searching
    Ouyang Yi
    Yun Ling
    Zhiyong Zhang
    Sixth Wuhan International Conference on E-Business, Vols 1-4: MANAGEMENT CHALLENGES IN A GLOBAL WORLD, 2007, : 639 - 646
  • [6] A novel algorithm for the multi-level semantic information of image searching
    Yi, Ouyang
    Zhang-Zhiyong
    9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 758 - +
  • [7] A multi-scale multi-level deep descriptor with saliency for image retrieval
    Wu, Zebin
    Yu, Junqing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 37939 - 37958
  • [8] A multi-scale multi-level deep descriptor with saliency for image retrieval
    Zebin Wu
    Junqing Yu
    Multimedia Tools and Applications, 2023, 82 : 37939 - 37958
  • [9] Multi-level supervised hashing with deep features for efficient image retrieval
    Ng, Wing W. Y.
    Li, Jiayong
    Tian, Xing
    Wang, Hui
    Kwong, Sam
    Wallace, Jonathan
    NEUROCOMPUTING, 2020, 399 : 171 - 182
  • [10] Weakly Supervised Image Retrieval via Coarse-scale Feature Fusion and Multi-level Attention Blocks
    Nie, Xinyao
    Lu, Hong
    Wang, Zijian
    Liu, Jingyuan
    Guo, Zehua
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 48 - 52