Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

被引：17

作者：

Lin, Fengyin ^{[1
]}

Li, Mingkang ^{[1
]}

Li, Da ^{[2
]}

Hospedales, Timothy ^{[2
,3
]}

Song, Yi-Zhe ^{[4
]}

Qi, Yonggang ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Samsung AI Ctr, Cambridge, England

[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[4] Univ Surrey, SketchX, CVSSP, Guildford, Surrey, England

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02236

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches - akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at https://github.com/buptLinfy/ZSE-SBIR.

引用

页码：23349 / 23358

页数：10

共 50 条

[31] XPNet: Cross-Domain Prototypical Network for Zero-Shot Sketch-Based Image Retrieval
Li, Mingkang
Qi, Yonggang
PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 394 - 410
[32] Zero-Shot Sketch-Based Image Retrieval Using StyleGen and Stacked Siamese Neural Networks
Gopu, Venkata Rama Muni Kumar
Dunna, Madhavi
JOURNAL OF IMAGING, 2024, 10 (04)
[33] Task-like training paradigm in CLIP for zero-shot sketch-based image retrieval
Zhang, Haoxiang
Cheng, Deqiang
Jiang, He
Liu, Jingjing
Kou, Qiqi
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57811 - 57828
[34] Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
Singha, Mainak
Jha, Ankit
Gupta, Divyam
Singla, Pranav
Banerjee, Biplab
COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 1 - 19
[35] Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval
Wang, Wenjie
Shi, Yufeng
Chen, Shiming
Peng, Qinmu
Zheng, Feng
You, Xinge
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1106 - 1112
[36] Cross-Domain Feature Semantic Calibration for Zero-Shot Sketch-Based Image Retrieval
He, Xuewan
Wang, Jielei
Xia, Qianxin
Lu, Guoming
Tang, Yuan
Lu, Hongxia
2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
[37] Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval
Zhang, Qing
Zhang, Jing
Su, Xiangdong
Wang, Yonghe
Bao, Feilong
Gao, Guanglai
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
[38] Deep supervision network with contrastive learning for zero-shot sketch-based retrieval
Shu, Zhenqiu
Zhuo, Guangyao
Yu, Jun
Yu, Zhengtao
APPLIED SOFT COMPUTING, 2024, 167
[39] Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval
Jing, Taotao
Xia, Haifeng
Hamm, Jihun
Ding, Zhengming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3657 - 3668
[40] Zero-shot sketch-based image retrieval via adaptive relation-aware metric learning
Liu, Yang
Dang, Yuhao
Gao, Xinbo
Han, Jungong
Shao, Ling
PATTERN RECOGNITION, 2024, 152

← 1 2 3 4 5 →