Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

被引：17

作者：

Lin, Fengyin ^{[1
]}

Li, Mingkang ^{[1
]}

Li, Da ^{[2
]}

Hospedales, Timothy ^{[2
,3
]}

Song, Yi-Zhe ^{[4
]}

Qi, Yonggang ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Samsung AI Ctr, Cambridge, England

[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[4] Univ Surrey, SketchX, CVSSP, Guildford, Surrey, England

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.02236

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches - akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at https://github.com/buptLinfy/ZSE-SBIR.

引用

页码：23349 / 23358

页数：10

共 50 条

[21] Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
Liu, Qing
Xie, Lingxi
Wang, Huiyu
Yuile, Alan L.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3661 - 3670
[22] Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval
Hao Ren
Ziqiang Zheng
Hong Lu
Neural Processing Letters, 2022, 54 : 5711 - 5720
[23] Semi-transductive Learning for Generalized Zero-Shot Sketch-Based Image Retrieval
Ge, Ce
Wang, Jingyu
Qi, Qi
Sun, Haifeng
Xu, Tong
Liao, Jianxin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7678 - 7686
[24] Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement
Li, Jiangtong
Ling, Zhixin
Niu, Li
Zhang, Liqing
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 218
[25] A Zero-Shot Framework for Sketch Based Image Retrieval
Yelamarthi, Sasi Kiran
Reddy, Shiva Krishna
Mishra, Ashish
Mittal, Anurag
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
[26] Zero-Shot Sketch-Based Image Retrieval with Hybrid Information Fusion and Sample Relationship Modeling
Wu, Weijie
Li, Jun
Wu, Zhijian
Xu, Jianhua
MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 337 - 350
[27] Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrieval
Zhang, Haoxiang
Cheng, Deqiang
Kou, Qiqi
Asad, Mujtaba
Jiang, He
ADVANCED ENGINEERING INFORMATICS, 2024, 60
[28] CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
Sain, Aneeshan
Bhunia, Ayan Kumar
Chowdhury, Pinaki Nath
Koley, Subhadeep
Xiang, Tao
Song, Yi-Zhe
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2765 - 2775
[29] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
Deng, Cheng
Xu, Xinxun
Wang, Hao
Yang, Muli
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
[30] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
Tian J.-L.
Xu X.
Shen F.-M.
Shen H.-T.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):

← 1 2 3 4 5 →