You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

被引:2
作者
Koley, Subhadeep [1 ,2 ]
Bhunia, Ayan Kumar [1 ]
Sahli, Aneeshan [1 ]
Chowdhury, Pinaki Nath [1 ]
Xiang, Tao [1 ,2 ]
Song, Yi-Zhe [1 ,2 ]
机构
[1] Univ Surrey, CVSSP, SketchX, Guildford, Surrey, England
[2] iFlyTek Surrey Joint Res Ctr Artificial Intellige, Guildford, Surrey, England
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
关键词
D O I
10.1109/CVPR52733.2024.01562
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual descriptions. Last but not least, our system extends to novel applications in composed image retrieval, domain attribute transfer, and fine-grained generation, providing solutions for various real-world scenarios.
引用
收藏
页码:16509 / 16519
页数:11
相关论文
共 74 条
  • [51] Radford A, 2021, PR MACH LEARN RES, V139
  • [52] U-Net: Convolutional Networks for Biomedical Image Segmentation
    Ronneberger, Olaf
    Fischer, Philipp
    Brox, Thomas
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241
  • [53] Sain Aneeshan, 2023, CVPR
  • [54] Sain Aneeshan, 2023, CVPR
  • [55] Sain Aneeshan, 2020, BMVC
  • [56] Saito Kuniaki, 2023, CVPR
  • [57] Sangkloy Patsom, 2022, ECCV
  • [58] Sangkloy Patsorn, 2016, ACM TOG
  • [59] Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval
    Song, Jifei
    Yu, Qian
    Song, Yi-Zhe
    Xiang, Tao
    Hospedales, Timothy M.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5552 - 5561
  • [60] Song Jifei, 2017, BMVC