SwinFG: A fine-grained recognition scheme based on swin transformer

被引:8
|
作者
Ma, Zhipeng [1 ]
Wu, Xiaoyu [1 ,2 ]
Chu, Anzhuo [3 ]
Huang, Lei [1 ]
Wei, Zhiqiang [1 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266000, Peoples R China
[2] Shandong Comp Sci Ctr, Jinan, Peoples R China
[3] Univ Manchester, Oxford Rd, Manchester M13 9PL, England
基金
中国国家自然科学基金;
关键词
Swin transformer; Fine-grained image recognition; Image classification; Visual attention; Local region feature; Discriminative foreground;
D O I
10.1016/j.eswa.2023.123021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image recognition (FGIR) is a challenging task as it requires the recognition of sub-categories with subtle differences. Recently, the swin transformer has shown impressive performance in various fields. Our research has shown that swin transformer applied directly to FGIR is also highly effective compared to many other approaches and can be further enhanced with adaptive improvements. In this paper, we propose a novel swin transformer based architecture, named SwinFG, which enhances FGIR by leveraging shifted window based self-attention to locate discriminative regions. The self-attention computation fuses image patches together based on attention weights, enabling the subsequent influence of each patch to be tracked and its contribution to the extracted feature to be determined. This forms the basis for locating discriminative regions. To this end, we propose a series of transformations that integrate the attention weights of local windows in each block into attention maps, which can be recursively multiplied to track changes in the attention weights. As the discriminative regions are not entirely occupied by the foreground object, the background information is also expressed in the extracted feature inevitably. To address this, we propose conducting contrastive learning on features obtained from both the discriminative and background regions of a single image to enlarge their distance and further eliminate any potential influence from the background. We demonstrate the state-of-the-art performance of our model on four popular fine-grained benchmarks. (The code is available at https://anonymous.4open.science/r/swinFG-1DCE).
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Swin-Panda: Behavior Recognition for Giant Pandas Based on Local Fine-Grained and Spatiotemporal Displacement Features
    Yi, Xinyu
    Su, Han
    Min, Peng
    He, Mengnan
    Han, Yimin
    Luo, Gai
    Wu, Pengcheng
    Min, Qingyue
    Hou, Rong
    Chen, Peng
    DIVERSITY-BASEL, 2025, 17 (02):
  • [22] SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species
    Wang, Shuai
    Han, Yubing
    Song, Shouliang
    Zhu, Honglei
    Zhang, Li
    Dong, Anming
    Yu, Jiguo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 232 - 243
  • [23] A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos
    Koch, Jannik
    Wolf, Stefan
    Beyerer, Juergen
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 100 - 109
  • [24] Fine-grained visual clasificatio based on compct Vision transformer
    Xu H.
    Guo L.
    Li R.-Z.
    Kongzhi yu Juece/Control and Decision, 2024, 39 (03): : 893 - 900
  • [25] Fine-Grained Image Classification Model Based on Improved Transformer
    Tian Zhansheng
    Liu Libo
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
  • [26] TransFGVC: transformer-based fine-grained visual classification
    Shen, Longfeng
    Hou, Bin
    Jian, Yulei
    Tu, Xisong
    Zhang, Yingjie
    Shuai, Lingying
    Ge, Fangzhen
    Chen, Debao
    VISUAL COMPUTER, 2025, 41 (04): : 2439 - 2459
  • [27] Collaborative Representation based Fine-grained Species Recognition
    Chakraborti, Tapabrata
    McCane, Brendan
    Mills, Steven
    Pal, Umapada
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2016, : 42 - 47
  • [28] TransFGVC: transformer-based fine-grained visual classificationTransFGVC: transformer-based fine-grained visual classificationL. Shen et al.
    Longfeng Shen
    Bin Hou
    Yulei Jian
    Xisong Tu
    Yingjie Zhang
    Lingying Shuai
    Fangzhen Ge
    Debao Chen
    The Visual Computer, 2025, 41 (4) : 2439 - 2459
  • [29] Fine-Grained Temporal-Enhanced Transformer for Dynamic Facial Expression Recognition
    Zhang, Yaning
    Zhang, Jiahe
    Shen, Linlin
    Yu, Zitong
    Gao, Zan
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2560 - 2564
  • [30] Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition
    Liu, Dichao
    Wang, Yu
    Kato, Jien
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2577 - 2586