FET-FGVC: Feature-enhanced transformer for fine-grained visual classification

被引:15
作者
Chen, Huazhen [1 ]
Zhang, Haimiao [2 ]
Liu, Chang [2 ]
An, Jianpeng [1 ]
Gao, Zhongke [1 ]
Qiu, Jun [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Inst Appl Math, Beijing 100192, Peoples R China
关键词
Fine-grained visual classification (FGVC); Transformer; Graph convolutional network (GCN); Feature enhancement; NETWORK;
D O I
10.1016/j.patcog.2024.110265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The challenge of Fine-grained visual classification (FGVC) comes from the small variations between classes and the large variations within classes. Inspired by the fact that identifying bird species focuses not only on the global features of the subject area but also on the subtle details of the local area, we propose a featureenhanced Transformer to improve the performance of FGVC. Our proposed method consists of a Dynamic Swin Transformer backbone for extracting comprehensive global image features through continuous attention aggregation, a GCN-based local branch for separating and enhancing local features in different regions, and a pairwise feature interaction (PFI) module for enhancing global features through interactions between image pairs. We conducted extensive experiments on five FGVC datasets to demonstrate the superiority of our method. By fusing the enhanced global and local features, our method achieves the best accuracy compared to existing methods. Our method has an advantage in terms of computational efficiency.
引用
收藏
页数:13
相关论文
共 61 条
  • [1] Behera A, 2021, AAAI CONF ARTIF INTE, V35, P929
  • [2] Carion N., 2020, EUROPEAN C COMPUTER, P213
  • [3] Weakly Supervised Fine-Grained Image Classification via Salient Region Localization and Different Layer Feature Fusion
    Chen, Fangxiong
    Huang, Guoheng
    Lan, Jiaying
    Wu, Yanhui
    Pun, Chi-Man
    Ling, Wing-Kuen
    Cheng, Lianglun
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (13):
  • [4] Diao Q., 2022, METAFORMER UNIFIED M
  • [5] Dosovitskiy A., 2021, arXiv
  • [6] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
    Fu, Jianlong
    Zheng, Heliang
    Mei, Tao
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4476 - 4484
  • [7] Knowledge aggregation networks for class incremental learning
    Fu, Zhiling
    Wang, Zhe
    Xu, Xinlei
    Li, Dongdong
    Yang, Hai
    [J]. PATTERN RECOGNITION, 2023, 137
  • [8] On the Imaginary Wings: Text-Assisted Complex-Valued Fusion Network for Fine-Grained Visual Classification
    Guan, Xiang
    Yang, Yang
    Li, Jingjing
    Zhu, Xiaofeng
    Song, Jingkuan
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5112 - 5121
  • [9] He J, 2022, AAAI CONF ARTIF INTE, P852
  • [10] Masked Autoencoders Are Scalable Vision Learners
    He, Kaiming
    Chen, Xinlei
    Xie, Saining
    Li, Yanghao
    Dollar, Piotr
    Girshick, Ross
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988