You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

被引：2

作者：

Koley, Subhadeep ^{[1
,2
]}

Bhunia, Ayan Kumar ^{[1
]}

Sahli, Aneeshan ^{[1
]}

Chowdhury, Pinaki Nath ^{[1
]}

Xiang, Tao ^{[1
,2
]}

Song, Yi-Zhe ^{[1
,2
]}

机构：

[1] Univ Surrey, CVSSP, SketchX, Guildford, Surrey, England

[2] iFlyTek Surrey Joint Res Ctr Artificial Intellige, Guildford, Surrey, England

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.01562

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual descriptions. Last but not least, our system extends to novel applications in composed image retrieval, domain attribute transfer, and fine-grained generation, providing solutions for various real-world scenarios.

引用

页码：16509 / 16519

页数：11

共 74 条

[51] Radford A, 2021, PR MACH LEARN RES, V139
[52] U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Olaf
Fischer, Philipp
Brox, Thomas
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241
[53] Sain Aneeshan, 2023, CVPR
[54] Sain Aneeshan, 2023, CVPR
[55] Sain Aneeshan, 2020, BMVC
[56] Saito Kuniaki, 2023, CVPR
[57] Sangkloy Patsom, 2022, ECCV
[58] Sangkloy Patsorn, 2016, ACM TOG
[59] Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval
Song, Jifei
Yu, Qian
Song, Yi-Zhe
Xiang, Tao
Hospedales, Timothy M.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5552 - 5561
[60] Song Jifei, 2017, BMVC

← 1 2 3 4 5 6 7 8 →