ZoomViT: an observation behavior-based fine-grained recognition scheme

被引:0
作者
Ma Z. [1 ]
Yang Y. [1 ]
Wang H. [2 ]
Huang L. [1 ]
Wei Z. [1 ]
机构
[1] Faculty of Information Science and Engineering, Ocean University of China, Songling Road, Shandong, Qingdao
[2] College of Computer and Cyber Security, Fujian Normal University, Xuefu South Road, Fuzhou
基金
中国国家自然科学基金;
关键词
Discriminative foreground; Fine-grained image recognition; Image classification; Local region feature; Observation behavior; Visual attention;
D O I
10.1007/s00521-024-09961-y
中图分类号
学科分类号
摘要
Fine-grained image recognition aims to distinguish many images with subtle differences and identify the sub-categories to which they belong. Recently, vision transformer (ViT) has achieved promising results in many computer vision tasks. In this paper, we introduce human observation behavior into ViT and propose a novel transformer-based network, named ZoomViT. We divide the fine-grained recognition into two steps "look closer" and "contrast." Firstly, looking closer is to observe finer local regions and multi-scale features, and avoid the adverse effect of background on recognition. We design the zoom-in module to track the attention flow by integrating the attention weights to zoom in the discriminative foreground regions. Subsequently, the straight image splitting like ViT may harm recognition adversely. Therefore, we design the zoom-out module combining overlapping cutting and downsampling to maintain the integrity of local neighboring structures and the running efficiency of the model in recognition. Finally, we propose to contrast the features of known sub-categories to supervise the model to learn subtle differences among different sub-categories. The consistency of features extracted from different batches increases over time; for this reason, we proposed a variable-length queue to store features from different batches to efficiently and fully conduct contrastive learning. We experimentally demonstrate the state-of-the-art performance of our model on four popular fine-grained benchmarks: CUB-200-2011, Stanford Dogs, NABirds, and iNat2017. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:12775 / 12789
页数:14
相关论文
共 50 条
  • [21] Fine-grained skeleton action recognition with pairwise motion salience learning
    Li H.
    Tu Z.
    Xie W.
    Zhang J.
    Scientia Sinica Informationis, 2023, 53 (12) : 2440 - 2457
  • [22] Bidirectional Attention-Recognition Model for Fine-Grained Object Classification
    Liu, Chuanbin
    Xie, Hongtao
    Zha, Zhengjun
    Yu, Lingyun
    Chen, Zhineng
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (07) : 1785 - 1795
  • [23] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
    Chen, Tianshui
    Wu, Wenxi
    Gao, Yuefang
    Dong, Le
    Luo, Xiaonan
    Lin, Liang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031
  • [24] The feature generator of hard negative samples for fine-grained image recognition
    Kim, Taehung
    Hong, Kibeom
    Byun, Hyeran
    NEUROCOMPUTING, 2021, 439 : 374 - 382
  • [25] Enhancing Mixture-of-Experts by Leveraging Attention for Fine-Grained Recognition
    Zhang, Lianbo
    Huang, Shaoli
    Liu, Wei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 4409 - 4421
  • [26] Dual Guidance Enabled Fuzzy Inference for Enhanced Fine-Grained Recognition
    Chen, Qiupu
    He, Feng
    Wang, Gang
    Bai, Xiao
    Cheng, Long
    Ning, Xin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2025, 33 (01) : 418 - 430
  • [27] Saliency for fine-grained object recognition in domains with scarce training data
    Figueroa Flores, Carola
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Raducanu, Bogdan
    PATTERN RECOGNITION, 2019, 94 : 62 - 73
  • [28] Refining deep convolutional features for improving fine-grained image recognition
    Zhang, Weixia
    Yan, Jia
    Shi, Wenxuan
    Feng, Tianpeng
    Deng, Dexiang
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
  • [29] Siamese transformer with hierarchical concept embedding for fine-grained image recognition
    Yilin Lyu
    Liping Jing
    Jiaqi Wang
    Mingzhe Guo
    Xinyue Wang
    Jian Yu
    Science China Information Sciences, 2023, 66
  • [30] Transformer with peak suppression and knowledge guidance for fine-grained image recognition
    Liu, Xinda
    Wang, Lili
    Han, Xiaoguang
    NEUROCOMPUTING, 2022, 492 : 137 - 149