TransFGVC: transformer-based fine-grained visual classification

被引:0
|
作者
Shen, Longfeng [1 ,2 ,4 ]
Hou, Bin [1 ,4 ]
Jian, Yulei [1 ,2 ,4 ]
Tu, Xisong [1 ,4 ]
Zhang, Yingjie [1 ,4 ]
Shuai, Lingying [3 ]
Ge, Fangzhen [1 ,2 ,4 ]
Chen, Debao [1 ,2 ,4 ]
机构
[1] Huaibei Normal Univ, Sch Comp Sci & Technol, Anhui Engn Res Ctr Intelligent Comp & Applicat Cog, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, 5089 Wangjiang West Rd, Hefei 230088, Anhui, Peoples R China
[3] Huaibei Normal Univ, Coll Life Sci, 100 Dongshan Rd, Huaibei 235000, Anhui, Peoples R China
[4] Huaibei Normal Univ, Anhui Big Data Res Ctr Univ Manage, 100 Dongshen Rd, Huaibei 235000, Anhui, Peoples R China
来源
VISUAL COMPUTER | 2025年 / 41卷 / 04期
基金
中国国家自然科学基金;
关键词
Computer vision; Fine-grained visual classification; LSTM; Swin Transformer; Birds-267-2022; dataset;
D O I
10.1007/s00371-024-03545-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fine-grained visual classification (FGVC) aims to identify subcategories of objects within the same superclass. This task is challenging owing to high intra-class variance and low inter-class variance. The most recent methods focus on locating discriminative areas and then training the classification network to further capture the subtle differences among them. On the one hand, the detection network often obtains an entire part of the object, and positioning errors occur. On the other hand, these methods ignore the correlations between the extracted regions. We propose a novel highly scalable approach, called TransFGVC, that cleverly combines Swin Transformers with long short-term memory (LSTM) networks to address the above problems. The Swin Transformer is used to obtain remarkable visual tokens through self-attention layer stacking, and LSTM is used to model them globally, which not only accurately locates the discriminative region but also further introduces global information that is important for FGVC. The proposed method achieves competitive performance with accuracy rates of 92.7%, 91.4% and 91.5% using the public CUB-200-2011 and NABirds datasets and our Birds-267-2022 dataset, and the Params and FLOPs of our method are 25% and 27% lower, respectively, than the current SotA method HERBS. To effectively promote the development of FGVC, we developed the Birds-267-2022 dataset, which has 267 categories and 12,233 images.
引用
收藏
页码:2439 / 2459
页数:21
相关论文
共 50 条
  • [21] FEATURE COMPARISON BASED CHANNEL ATTENTION FOR FINE-GRAINED VISUAL CLASSIFICATION
    Jia, Shukun
    Bai, Yan
    Zhang, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1776 - 1780
  • [22] Recombining Vision Transformer Architecture for Fine-Grained Visual Categorization
    Deng, Xuran
    Liu, Chuanbin
    Lu, Zhiying
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 127 - 138
  • [23] SoyaTrans: A novel transformer model for fine-grained visual classification of soybean leaf disease diagnosis
    Sharma, Vivek
    Tripathi, Ashish Kumar
    Mittal, Himanshu
    Nkenyereye, Lewis
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 260
  • [24] Fine-Grained Ship Classification by Combining CNN and Swin Transformer
    Huang, Liang
    Wang, Fengxiang
    Zhang, Yalun
    Xu, Qingxia
    REMOTE SENSING, 2022, 14 (13)
  • [25] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
    Su, Tong
    Ye, Shuo
    Song, Chengqun
    Cheng, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
  • [26] Efficient Image Embedding for Fine-Grained Visual Classification
    Payatsuporn, Soranan
    Kijsirikul, Boonserm
    2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 40 - 45
  • [27] Adaptive Destruction Learning for Fine-grained Visual Classification
    Zhang, Riheng
    Tan, Min
    Mao, Xiaoyang
    Gao, Zhigang
    Gu, Xiaoling
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 946 - 950
  • [28] Exploration of Class Center for Fine-Grained Visual Classification
    Yao, Hang
    Miao, Qiguang
    Zhao, Peipei
    Li, Chaoneng
    Li, Xin
    Feng, Guanwen
    Liu, Ruyi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9954 - 9966
  • [29] A sparse focus framework for visual fine-grained classification
    YongXiong Wang
    Guangjun Li
    Li Ma
    Multimedia Tools and Applications, 2021, 80 : 25271 - 25289
  • [30] A sparse focus framework for visual fine-grained classification
    Wang, YongXiong
    Li, Guangjun
    Ma, Li
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 25271 - 25289