SwinFG: A fine-grained recognition scheme based on swin transformer

被引:8
|
作者
Ma, Zhipeng [1 ]
Wu, Xiaoyu [1 ,2 ]
Chu, Anzhuo [3 ]
Huang, Lei [1 ]
Wei, Zhiqiang [1 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266000, Peoples R China
[2] Shandong Comp Sci Ctr, Jinan, Peoples R China
[3] Univ Manchester, Oxford Rd, Manchester M13 9PL, England
基金
中国国家自然科学基金;
关键词
Swin transformer; Fine-grained image recognition; Image classification; Visual attention; Local region feature; Discriminative foreground;
D O I
10.1016/j.eswa.2023.123021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image recognition (FGIR) is a challenging task as it requires the recognition of sub-categories with subtle differences. Recently, the swin transformer has shown impressive performance in various fields. Our research has shown that swin transformer applied directly to FGIR is also highly effective compared to many other approaches and can be further enhanced with adaptive improvements. In this paper, we propose a novel swin transformer based architecture, named SwinFG, which enhances FGIR by leveraging shifted window based self-attention to locate discriminative regions. The self-attention computation fuses image patches together based on attention weights, enabling the subsequent influence of each patch to be tracked and its contribution to the extracted feature to be determined. This forms the basis for locating discriminative regions. To this end, we propose a series of transformations that integrate the attention weights of local windows in each block into attention maps, which can be recursively multiplied to track changes in the attention weights. As the discriminative regions are not entirely occupied by the foreground object, the background information is also expressed in the extracted feature inevitably. To address this, we propose conducting contrastive learning on features obtained from both the discriminative and background regions of a single image to enlarge their distance and further eliminate any potential influence from the background. We demonstrate the state-of-the-art performance of our model on four popular fine-grained benchmarks. (The code is available at https://anonymous.4open.science/r/swinFG-1DCE).
引用
收藏
页数:9
相关论文
共 50 条
  • [41] MULTI-EXIT VISION TRANSFORMER WITH CUSTOM FINE-TUNING FOR FINE-GRAINED IMAGE RECOGNITION
    Shen, Tianyi
    Lee, Chonghan
    Narayanan, Vijaykrishnan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2830 - 2834
  • [42] Token-Selective Vision Transformer for fine-grained image recognition of marine organisms
    Si, Guangzhe
    Xiao, Ying
    Wei, Bin
    Bullock, Leon Bevan
    Wang, Yueyue
    Wang, Xiaodong
    FRONTIERS IN MARINE SCIENCE, 2023, 10
  • [43] A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer
    Cai, Yulin
    Wang, Haoqian
    Wang, Xingzheng
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST 2022), 2022,
  • [44] Fine-Grained Recognition and Suppression of ISRJ Based on UNet-A
    Wu, Yaojun
    Duan, Lining
    Yang, Liaoming
    Liu, Zhixing
    Xing, Mengdao
    Quan, Yinghui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [45] Summary of Fine-Grained Image Recognition Based on Attention Mechanism
    Yao, Ma
    Min, Zhi
    THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [46] Fine-Grained Activity Recognition with Holistic and Pose Based Features
    Pishchulin, Leonid
    Andriluka, Mykhaylo
    Schiele, Bernt
    PATTERN RECOGNITION, GCPR 2014, 2014, 8753 : 678 - 689
  • [47] A General Vocabulary Based Approach for Fine-Grained Object Recognition
    Aich, Shubhra
    Lee, Chil-Woo
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 572 - 581
  • [48] SwinSAM: Fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin Transformer decoder
    Feng, Zhoushan
    Zhang, Yuliang
    Chen, Yanhong
    Shi, Yiyu
    Liu, Yu
    Sun, Wen
    Du, Lili
    Chen, Dunjin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [49] FedLVR: a federated learning-based fine-grained vehicle recognition scheme in intelligent traffic system
    Zeng J.
    Zhang K.
    Wang L.
    Li J.
    Multimedia Tools and Applications, 2023, 82 (24) : 37431 - 37452
  • [50] Fine-grained analysis of the transformer model for efficient pruning
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 897 - 902