Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

被引:17
|
作者
Lin, Fengyin [1 ]
Li, Mingkang [1 ]
Li, Da [2 ]
Hospedales, Timothy [2 ,3 ]
Song, Yi-Zhe [4 ]
Qi, Yonggang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Samsung AI Ctr, Cambridge, England
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Univ Surrey, SketchX, CVSSP, Guildford, Surrey, England
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.02236
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network ("everything"), and (ii) we would really like to understand how this sketch-photo matching operates ("explainable"). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches - akin to the seasoned "bag-of-words" paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute local correspondences between the visual tokens across two modalities, and finally (iii) a kernel-based relation network to assemble local putative matches and produce an overall similarity metric for a sketch-photo pair. Experiments show ours indeed delivers superior performances across all ZS-SBIR settings. The all important explainable goal is elegantly achieved by visualizing cross-modal token correspondences, and for the first time, via sketch to photo synthesis by universal replacement of all matched photo patches. Code and model are available at https://github.com/buptLinfy/ZSE-SBIR.
引用
收藏
页码:23349 / 23358
页数:10
相关论文
共 50 条
  • [21] Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
    Liu, Qing
    Xie, Lingxi
    Wang, Huiyu
    Yuile, Alan L.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3661 - 3670
  • [22] Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval
    Hao Ren
    Ziqiang Zheng
    Hong Lu
    Neural Processing Letters, 2022, 54 : 5711 - 5720
  • [23] Semi-transductive Learning for Generalized Zero-Shot Sketch-Based Image Retrieval
    Ge, Ce
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    Xu, Tong
    Liao, Jianxin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7678 - 7686
  • [24] Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement
    Li, Jiangtong
    Ling, Zhixin
    Niu, Li
    Zhang, Liqing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 218
  • [25] A Zero-Shot Framework for Sketch Based Image Retrieval
    Yelamarthi, Sasi Kiran
    Reddy, Shiva Krishna
    Mishra, Ashish
    Mittal, Anurag
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
  • [26] Zero-Shot Sketch-Based Image Retrieval with Hybrid Information Fusion and Sample Relationship Modeling
    Wu, Weijie
    Li, Jun
    Wu, Zhijian
    Xu, Jianhua
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 337 - 350
  • [27] Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrieval
    Zhang, Haoxiang
    Cheng, Deqiang
    Kou, Qiqi
    Asad, Mujtaba
    Jiang, He
    ADVANCED ENGINEERING INFORMATICS, 2024, 60
  • [28] CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
    Sain, Aneeshan
    Bhunia, Ayan Kumar
    Chowdhury, Pinaki Nath
    Koley, Subhadeep
    Xiang, Tao
    Song, Yi-Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2765 - 2775
  • [29] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [30] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
    Tian J.-L.
    Xu X.
    Shen F.-M.
    Shen H.-T.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):