TransFGVC: transformer-based fine-grained visual classification

被引：0

作者：

Shen, Longfeng ^{[1
,2
,4
]}

Hou, Bin ^{[1
,4
]}

Jian, Yulei ^{[1
,2
,4
]}

Tu, Xisong ^{[1
,4
]}

Zhang, Yingjie ^{[1
,4
]}

Shuai, Lingying ^{[3
]}

Ge, Fangzhen ^{[1
,2
,4
]}

Chen, Debao ^{[1
,2
,4
]}

机构：

[1] Huaibei Normal Univ, Sch Comp Sci & Technol, Anhui Engn Res Ctr Intelligent Comp & Applicat Cog, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, 5089 Wangjiang West Rd, Hefei 230088, Anhui, Peoples R China

[3] Huaibei Normal Univ, Coll Life Sci, 100 Dongshan Rd, Huaibei 235000, Anhui, Peoples R China

[4] Huaibei Normal Univ, Anhui Big Data Res Ctr Univ Manage, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Computer vision; Fine-grained visual classification; LSTM; Swin Transformer; Birds-267-2022; dataset;

D O I：

10.1007/s00371-024-03545-6

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to high intra-class variance and low inter-class variance. The most recent methods focus on locating discriminative areas and then training the classification network to further capture the subtle differences among them. On the one hand, the detection network often obtains an entire part of the object, and positioning errors occur. On the other hand, these methods ignore the correlations between the extracted regions. We propose a novel highly scalable approach, called TransFGVC, that cleverly combines Swin Transformers with long short-term memory (LSTM) networks to address the above problems. The Swin Transformer is used to obtain remarkable visual tokens through self-attention layer stacking, and LSTM is used to model them globally, which not only accurately locates the discriminative region but also further introduces global information that is important for FGVC. The proposed method achieves competitive performance with accuracy rates of 92.7%, 91.4% and 91.5% using the public CUB-200-2011 and NABirds datasets and our Birds-267-2022 dataset, and the Params and FLOPs of our method are 25% and 27% lower, respectively, than the current SotA method HERBS. To effectively promote the development of FGVC, we developed the Birds-267-2022 dataset, which has 267 categories and 12,233 images.

引用

页码：2439 / 2459

页数：21

共 50 条

[41] Multistage attention region supplement transformer for fine-grained visual categorization
Mei, Aokun
Huo, Hua
Xu, Jiaxin
Xu, Ningya
VISUAL COMPUTER, 2025, 41 (03): : 1873 - 1889
[42] MFF-Trans: Multi-level Feature Fusion Transformer for Fine-Grained Visual Classification
Hang, Qi
Yan, Xuefeng
Gong, Lina
WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 220 - 234
[43] Fine-grained sentiment classification based on HowNet
Li, Wen
Chen, Yuefeng
Wang, Weili
Journal of Convergence Information Technology, 2012, 7 (19) : 86 - 92
[44] Multistage attention region supplement transformer for fine-grained visual categorizationMultistage attention region supplement transformer for fine-grained visual categorizationA. Mei et al.
Aokun Mei
Hua Huo
Jiaxin Xu
Ningya Xu
The Visual Computer, 2025, 41 (3) : 1873 - 1889
[45] A Progressive Gated Attention Model for Fine-Grained Visual Classification
Zhu, Qiangxi
Li, Zhixin
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2063 - 2068
[46] Learning Hierarchal Channel Attention for Fine-grained Visual Classification
Guan, Xiang
Wang, Guoqing
Xu, Xing
Bin, Yi
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5011 - 5019
[47] Using Coarse Label Constraint for Fine-Grained Visual Classification
Lu, Chaohao
Zou, Yuexian
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 266 - 277
[48] A collaborative gated attention network for fine-grained visual classification
Zhu, Qiangxi
Kuang, Wenlan
Li, Zhixin
DISPLAYS, 2023, 79
[49] Symmetrical irregular local features for fine-grained visual classification
Yang, Ming
Xu, Yang
Wu, Zebin
Wei, Zhihui
NEUROCOMPUTING, 2022, 505 : 304 - 314
[50] Fine-grained Image Classification by Visual-Semantic Embedding
Xu, Huapeng
Qi, Guilin
Li, Jingjing
Wang, Meng
Xu, Kang
Gao, Huan
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1043 - 1049

← 1 2 3 4 5 →