TransFGVC: transformer-based fine-grained visual classification

被引:0
|
作者
Shen, Longfeng [1 ,2 ,4 ]
Hou, Bin [1 ,4 ]
Jian, Yulei [1 ,2 ,4 ]
Tu, Xisong [1 ,4 ]
Zhang, Yingjie [1 ,4 ]
Shuai, Lingying [3 ]
Ge, Fangzhen [1 ,2 ,4 ]
Chen, Debao [1 ,2 ,4 ]
机构
[1] Huaibei Normal Univ, Sch Comp Sci & Technol, Anhui Engn Res Ctr Intelligent Comp & Applicat Cog, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, 5089 Wangjiang West Rd, Hefei 230088, Anhui, Peoples R China
[3] Huaibei Normal Univ, Coll Life Sci, 100 Dongshan Rd, Huaibei 235000, Anhui, Peoples R China
[4] Huaibei Normal Univ, Anhui Big Data Res Ctr Univ Manage, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China
来源
VISUAL COMPUTER | 2025年 / 41卷 / 04期
基金
中国国家自然科学基金;
关键词
Computer vision; Fine-grained visual classification; LSTM; Swin Transformer; Birds-267-2022; dataset;
D O I
10.1007/s00371-024-03545-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to high intra-class variance and low inter-class variance. The most recent methods focus on locating discriminative areas and then training the classification network to further capture the subtle differences among them. On the one hand, the detection network often obtains an entire part of the object, and positioning errors occur. On the other hand, these methods ignore the correlations between the extracted regions. We propose a novel highly scalable approach, called TransFGVC, that cleverly combines Swin Transformers with long short-term memory (LSTM) networks to address the above problems. The Swin Transformer is used to obtain remarkable visual tokens through self-attention layer stacking, and LSTM is used to model them globally, which not only accurately locates the discriminative region but also further introduces global information that is important for FGVC. The proposed method achieves competitive performance with accuracy rates of 92.7%, 91.4% and 91.5% using the public CUB-200-2011 and NABirds datasets and our Birds-267-2022 dataset, and the Params and FLOPs of our method are 25% and 27% lower, respectively, than the current SotA method HERBS. To effectively promote the development of FGVC, we developed the Birds-267-2022 dataset, which has 267 categories and 12,233 images.
引用
收藏
页码:2439 / 2459
页数:21
相关论文
共 50 条
  • [41] Multistage attention region supplement transformer for fine-grained visual categorization
    Mei, Aokun
    Huo, Hua
    Xu, Jiaxin
    Xu, Ningya
    VISUAL COMPUTER, 2025, 41 (03): : 1873 - 1889
  • [42] MFF-Trans: Multi-level Feature Fusion Transformer for Fine-Grained Visual Classification
    Hang, Qi
    Yan, Xuefeng
    Gong, Lina
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 220 - 234
  • [43] Fine-grained sentiment classification based on HowNet
    Li, Wen
    Chen, Yuefeng
    Wang, Weili
    Journal of Convergence Information Technology, 2012, 7 (19) : 86 - 92
  • [44] Multistage attention region supplement transformer for fine-grained visual categorizationMultistage attention region supplement transformer for fine-grained visual categorizationA. Mei et al.
    Aokun Mei
    Hua Huo
    Jiaxin Xu
    Ningya Xu
    The Visual Computer, 2025, 41 (3) : 1873 - 1889
  • [45] A Progressive Gated Attention Model for Fine-Grained Visual Classification
    Zhu, Qiangxi
    Li, Zhixin
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2063 - 2068
  • [46] Learning Hierarchal Channel Attention for Fine-grained Visual Classification
    Guan, Xiang
    Wang, Guoqing
    Xu, Xing
    Bin, Yi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5011 - 5019
  • [47] Using Coarse Label Constraint for Fine-Grained Visual Classification
    Lu, Chaohao
    Zou, Yuexian
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 266 - 277
  • [48] A collaborative gated attention network for fine-grained visual classification
    Zhu, Qiangxi
    Kuang, Wenlan
    Li, Zhixin
    DISPLAYS, 2023, 79
  • [49] Symmetrical irregular local features for fine-grained visual classification
    Yang, Ming
    Xu, Yang
    Wu, Zebin
    Wei, Zhihui
    NEUROCOMPUTING, 2022, 505 : 304 - 314
  • [50] Fine-grained Image Classification by Visual-Semantic Embedding
    Xu, Huapeng
    Qi, Guilin
    Li, Jingjing
    Wang, Meng
    Xu, Kang
    Gao, Huan
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1043 - 1049