Set of Diverse Queries With Uncertainty Regularization for Composed Image Retrieval

被引:0
作者
Xu, Yahui [1 ,2 ]
Wei, Jiwei [1 ,2 ]
Bin, Yi [3 ]
Yang, Yang [4 ,5 ]
Ma, Zeyu [1 ,2 ]
Shen, Heng Tao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Natl Univ Singapore, Inst Data Sci, Singapore 119077, Singapore
[4] Univ Elect Sci & Technol China UESTC, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[5] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Uncertainty; Semantics; Image retrieval; Probabilistic logic; Task analysis; Fuses; Loss measurement; Composed image retrieval; multi-modal learning; image retrieval;
D O I
10.1109/TCSVT.2024.3401006
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Composed image retrieval aims to search a target image by concurrently understanding the composed inputs with a reference image and the complementary modification text. It aims to find a shared latent space where the representation of the composed inputs is close to the desired target image. Most previous methods capture the one-to-one correspondence between the composed inputs and target image, which encodes the composed inputs and the target image into single points in the feature space. However, the one-to-one correspondence cannot effectively handle this task due to the inherent ambiguity problem arising from the various semantic meanings and data uncertainty. Specifically, the composed inputs and target image always exhibit various semantic meanings, affecting the retrieval results. Moreover, given the composed inputs (resp. target image), there are multiple target images (resp. composed inputs) that equally make sense. In this paper, we propose a novel method termed Set of Diverse Queries with Uncertainty Regularization (SDQUR) to solve such inherent ambiguity problem. First, we utilize diverse queries to adaptively aggregate the composed inputs and target image into multiple deterministic embeddings that capture different semantic meanings in the triplet affecting the retrieval process. It can exploit the deterministic many-to-many correspondence within each triple through these set-based queries. Moreover, we provide an uncertainty regularization module to encode the composed inputs and target image into gaussian distribution. Multiple potential positive candidates are sampled from the distribution for probabilistic many-to-many correspondence. Through the complementary deterministic and probabilistic many-to-many correspondence manner, we achieve consistent improvements on the standard FashionIQ, CIRR, and Shoes benchmarks, surpassing the state-of-the-art methods by a large margin.
引用
收藏
页码:10494 / 10506
页数:13
相关论文
共 50 条
  • [21] Robust Linear Subspace for Image Set Retrieval
    Zhu, Fuli
    Chen, Wenhai
    Chen, Liang
    ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 390 - 394
  • [22] An approach based on multiple representations and multiple queries for invariant image retrieval
    Abbadeni, Noureddine
    ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 570 - 579
  • [23] Target-Guided Composed Image Retrieval
    Wen, Haokun
    Zhang, Xian
    Song, Xuemeng
    Wei, Yinwei
    Nie, Liqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 915 - 923
  • [24] LLM-Enhanced Composed Image Retrieval: An Intent Uncertainty-Aware Linguistic-Visual Dual Channel Matching Model
    Ge, Hongfei
    Jiang, Yuanchun
    Sun, Jianshan
    Yuan, Kun
    Liu, Yezheng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [25] APPROACHES TO IMAGE RETRIEVAL USING FUZZY SET THEORY
    Yang, Li
    Hu, Xuelong
    Pan, Jun
    2008 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 422 - 425
  • [26] A New Feature Set For Content Based Image Retrieval
    Rao, M. Babu
    Kavitha, Ch
    Rao, B. Prabhakara
    Govardhan, A.
    2013 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2013, : 84 - 89
  • [27] Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries
    Chaudhary, Chandramani
    Goyal, Poonam
    Moniz, Joel Ruben Antony
    Goyal, Navneet
    Chen, Yi-Ping Phoebe
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 257 - 265
  • [28] Composed image retrieval: a survey on recent research and development
    Wan, Yongquan
    Zou, Guobing
    Zhang, Bofeng
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [29] Cross-Modal Joint Prediction and Alignment for Composed Query Image Retrieval
    Yang, Yuchen
    Wang, Min
    Zhou, Wengang
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3303 - 3311
  • [30] MULTI-ORDER ADVERSARIAL REPRESENTATION LEARNING FOR COMPOSED QUERY IMAGE RETRIEVAL
    Fu, Zhixiao
    Chen, Xinyuan
    Dong, Jianfeng
    Ji, Shouling
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1685 - 1689