Set of Diverse Queries With Uncertainty Regularization for Composed Image Retrieval

被引:0
作者
Xu, Yahui [1 ,2 ]
Wei, Jiwei [1 ,2 ]
Bin, Yi [3 ]
Yang, Yang [4 ,5 ]
Ma, Zeyu [1 ,2 ]
Shen, Heng Tao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Natl Univ Singapore, Inst Data Sci, Singapore 119077, Singapore
[4] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[5] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Uncertainty; Semantics; Image retrieval; Probabilistic logic; Task analysis; Fuses; Loss measurement; Composed image retrieval; multi-modal learning; image retrieval;
D O I
10.1109/TCSVT.2024.3401006
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Composed image retrieval aims to search a target image by concurrently understanding the composed inputs with a reference image and the complementary modification text. It aims to find a shared latent space where the representation of the composed inputs is close to the desired target image. Most previous methods capture the one-to-one correspondence between the composed inputs and target image, which encodes the composed inputs and the target image into single points in the feature space. However, the one-to-one correspondence cannot effectively handle this task due to the inherent ambiguity problem arising from the various semantic meanings and data uncertainty. Specifically, the composed inputs and target image always exhibit various semantic meanings, affecting the retrieval results. Moreover, given the composed inputs (resp. target image), there are multiple target images (resp. composed inputs) that equally make sense. In this paper, we propose a novel method termed Set of Diverse Queries with Uncertainty Regularization (SDQUR) to solve such inherent ambiguity problem. First, we utilize diverse queries to adaptively aggregate the composed inputs and target image into multiple deterministic embeddings that capture different semantic meanings in the triplet affecting the retrieval process. It can exploit the deterministic many-to-many correspondence within each triple through these set-based queries. Moreover, we provide an uncertainty regularization module to encode the composed inputs and target image into gaussian distribution. Multiple potential positive candidates are sampled from the distribution for probabilistic many-to-many correspondence. Through the complementary deterministic and probabilistic many-to-many correspondence manner, we achieve consistent improvements on the standard FashionIQ, CIRR, and Shoes benchmarks, surpassing the state-of-the-art methods by a large margin.
引用
收藏
页码:10494 / 10506
页数:13
相关论文
共 50 条
  • [1] Heterogeneous Feature Alignment and Fusion in Cross-Modal Augmented Space for Composed Image Retrieval
    Pang, Huaxin
    Wei, Shikui
    Zhang, Gangjian
    Zhang, Shiyin
    Qiu, Shuang
    Zhao, Yao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6446 - 6457
  • [2] Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Qiu, Shuang
    Zhao, Yao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5976 - 5988
  • [3] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval
    Xu, Yahui
    Bin, Yi
    Wei, Jiwei
    Yang, Yang
    Wang, Guoqing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8346 - 8357
  • [4] Multi-Grained Attention Network With Mutual Exclusion for Composed Query-Based Image Retrieval
    Li, Shenshen
    Xu, Xing
    Jiang, Xun
    Shen, Fumin
    Liu, Xin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2959 - 2972
  • [5] Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Qiu, Shuang
    Zhao, Yao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 916 - 928
  • [6] Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval
    Zhang, Feifei
    Xu, Mingliang
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1000 - 1011
  • [7] An analysis of failed queries for web image retrieval
    Pu, Hsiao-Tieh
    JOURNAL OF INFORMATION SCIENCE, 2008, 34 (03) : 275 - 289
  • [8] Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph
    Zeng, Yawen
    Wang, Yiru
    Liao, Dongliang
    Li, Gongfu
    Huang, Weijie
    Xu, Jin
    Cao, Da
    Man, Hong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10528 - 10537
  • [9] Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval
    Wen, Haokun
    Song, Xuemeng
    Yin, Jianhua
    Wu, Jianlong
    Guan, Weili
    Nie, Liqiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3665 - 3678
  • [10] Analyzing Users' Retrieval Behaviours and Image Queries of a Photojournalism Image Database
    Chen, Hsin-liang
    Kochtanek, Thomas
    Burns, Christopher Sean
    Shaw, Rick
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2010, 34 (03): : 249 - 270