SwinFG: A fine-grained recognition scheme based on swin transformer

被引:8
|
作者
Ma, Zhipeng [1 ]
Wu, Xiaoyu [1 ,2 ]
Chu, Anzhuo [3 ]
Huang, Lei [1 ]
Wei, Zhiqiang [1 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266000, Peoples R China
[2] Shandong Comp Sci Ctr, Jinan, Peoples R China
[3] Univ Manchester, Oxford Rd, Manchester M13 9PL, England
基金
中国国家自然科学基金;
关键词
Swin transformer; Fine-grained image recognition; Image classification; Visual attention; Local region feature; Discriminative foreground;
D O I
10.1016/j.eswa.2023.123021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image recognition (FGIR) is a challenging task as it requires the recognition of sub-categories with subtle differences. Recently, the swin transformer has shown impressive performance in various fields. Our research has shown that swin transformer applied directly to FGIR is also highly effective compared to many other approaches and can be further enhanced with adaptive improvements. In this paper, we propose a novel swin transformer based architecture, named SwinFG, which enhances FGIR by leveraging shifted window based self-attention to locate discriminative regions. The self-attention computation fuses image patches together based on attention weights, enabling the subsequent influence of each patch to be tracked and its contribution to the extracted feature to be determined. This forms the basis for locating discriminative regions. To this end, we propose a series of transformations that integrate the attention weights of local windows in each block into attention maps, which can be recursively multiplied to track changes in the attention weights. As the discriminative regions are not entirely occupied by the foreground object, the background information is also expressed in the extracted feature inevitably. To address this, we propose conducting contrastive learning on features obtained from both the discriminative and background regions of a single image to enlarge their distance and further eliminate any potential influence from the background. We demonstrate the state-of-the-art performance of our model on four popular fine-grained benchmarks. (The code is available at https://anonymous.4open.science/r/swinFG-1DCE).
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Fine grained food image recognition based on swin transformer
    Xiao, Zhiyong
    Diao, Guang
    Deng, Zhaohong
    JOURNAL OF FOOD ENGINEERING, 2024, 380
  • [2] Fine-Grained Ship Classification by Combining CNN and Swin Transformer
    Huang, Liang
    Wang, Fengxiang
    Zhang, Yalun
    Xu, Qingxia
    REMOTE SENSING, 2022, 14 (13)
  • [3] Fine-grained weed recognition using Swin Transformer and two-stage transfer learning
    Wang, Yecheng
    Zhang, Shuangqing
    Dai, Baisheng
    Yang, Sensen
    Song, Haochen
    FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [4] TransFG: A Transformer Architecture for Fine-Grained Recognition
    He, Ju
    Chen, Jie-Neng
    Liu, Shuai
    Kortylewski, Adam
    Yang, Cheng
    Bai, Yutong
    Wang, Changhu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 852 - 860
  • [5] Global-local feature learning for fine-grained food classification based on Swin Transformer
    Kim, Jun-Hwa
    Kim, Namho
    Won, Chee Sun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [6] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [7] Convolutional transformer network for fine-grained action recognition
    Ma, Yujun
    Wang, Ruili
    Zong, Ming
    Ji, Wanting
    Wang, Yi
    Ye, Baoliu
    NEUROCOMPUTING, 2024, 569
  • [8] Multimodal Fine-Grained Transformer Model for Pest Recognition
    Zhang, Yinshuo
    Chen, Lei
    Yuan, Yuan
    ELECTRONICS, 2023, 12 (12)
  • [9] Hybrid Granularities Transformer for Fine-Grained Image Recognition
    Yu, Ying
    Wang, Jinghui
    ENTROPY, 2023, 25 (04)
  • [10] ZoomViT: an observation behavior-based fine-grained recognition scheme
    Ma Z.
    Yang Y.
    Wang H.
    Huang L.
    Wei Z.
    Neural Computing and Applications, 2024, 36 (21) : 12775 - 12789