Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

被引:17
|
作者
Lin, Fengyin [1 ]
Li, Mingkang [1 ]
Li, Da [2 ]
Hospedales, Timothy [2 ,3 ]
Song, Yi-Zhe [4 ]
Qi, Yonggang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Samsung AI Ctr, Cambridge, England
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Univ Surrey, SketchX, CVSSP, Guildford, Surrey, England
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02236
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches - akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at https://github.com/buptLinfy/ZSE-SBIR.
引用
收藏
页码:23349 / 23358
页数:10
相关论文
共 50 条
  • [31] XPNet: Cross-Domain Prototypical Network for Zero-Shot Sketch-Based Image Retrieval
    Li, Mingkang
    Qi, Yonggang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 394 - 410
  • [32] Zero-Shot Sketch-Based Image Retrieval Using StyleGen and Stacked Siamese Neural Networks
    Gopu, Venkata Rama Muni Kumar
    Dunna, Madhavi
    JOURNAL OF IMAGING, 2024, 10 (04)
  • [33] Task-like training paradigm in CLIP for zero-shot sketch-based image retrieval
    Zhang, Haoxiang
    Cheng, Deqiang
    Jiang, He
    Liu, Jingjing
    Kou, Qiqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57811 - 57828
  • [34] Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
    Singha, Mainak
    Jha, Ankit
    Gupta, Divyam
    Singla, Pranav
    Banerjee, Biplab
    COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 1 - 19
  • [35] Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval
    Wang, Wenjie
    Shi, Yufeng
    Chen, Shiming
    Peng, Qinmu
    Zheng, Feng
    You, Xinge
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1106 - 1112
  • [36] Cross-Domain Feature Semantic Calibration for Zero-Shot Sketch-Based Image Retrieval
    He, Xuewan
    Wang, Jielei
    Xia, Qianxin
    Lu, Guoming
    Tang, Yuan
    Lu, Hongxia
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [37] Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval
    Zhang, Qing
    Zhang, Jing
    Su, Xiangdong
    Wang, Yonghe
    Bao, Feilong
    Gao, Guanglai
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [38] Deep supervision network with contrastive learning for zero-shot sketch-based retrieval
    Shu, Zhenqiu
    Zhuo, Guangyao
    Yu, Jun
    Yu, Zhengtao
    APPLIED SOFT COMPUTING, 2024, 167
  • [39] Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval
    Jing, Taotao
    Xia, Haifeng
    Hamm, Jihun
    Ding, Zhengming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3657 - 3668
  • [40] Zero-shot sketch-based image retrieval via adaptive relation-aware metric learning
    Liu, Yang
    Dang, Yuhao
    Gao, Xinbo
    Han, Jungong
    Shao, Ling
    PATTERN RECOGNITION, 2024, 152