Fine-Grained Visual Classification Network Based on Fusion Pooling and Attention Enhancement

被引:0
|
作者
Xiao B. [1 ]
Guo J. [1 ]
Zhang X. [1 ]
Wang M. [2 ]
机构
[1] School of Computer Science, Southwest Petroleum University, Chengdu
[2] School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu
关键词
Attention Mechanism; Data Augmentation; Fine-Grained Visual Classification; Fusion Pooling;
D O I
10.16451/j.cnki.issn1003-6059.202307007
中图分类号
学科分类号
摘要
The core of fine-grained visual classification is to extract image discriminative features. In most of the existing methods, attention mechanisms are introduced to focus the network on important regions of the object. However, this kind of approaches can only locate the salient feature and cannot cover all discriminative features. Consequently, different categories with similar features are easily confusing. Therefore, a fine-grained visual classification network based on fusion pooling and attention enhancement is proposed to obtain comprehensive discriminative features. At the end of the network, a fusion pooling module is designed with a three-branch structure to obtain multi-scale discriminative features. The three-branch structure includes global average pooling, global top-k pooling and the fusion of the previous two. In addition, an attention enhancement module is proposed to gain two more discriminative images through attention grid mixing module and attention cropping module under the guidance of attention maps. Experiments on fine-grained image datasets, CUB-200-2011, Stanford Cars and FGVC-Aircraft, verify the high accuracy rate and strong competitiveness of the proposed network. © 2023 Journal of Pattern Recognition and Artificial Intelligence. All rights reserved.
引用
收藏
页码:661 / 670
页数:9
相关论文
共 33 条
  • [1] HE J, CHEN J N, LIU S, Et al., TransFG: A Transformer Architecture for Fine-Grained Recognition [ C/OL ]
  • [2] ZHANG N, DONAHUE J, GIRSHICK R, Et al., Part-Based R-CNNs for Fine-Grained Category Detection, Proc of the European Conference on Computer Vision, pp. 834-849, (2014)
  • [3] BRANSON S, VAN HORN G, BELONGIE S, Et al., Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
  • [4] ZHANG H, XU T, ELHOSEINY M, Et al., SPDA-CNN: Unifying Semantic Part Detection and Abstraction for fine-Grained Recognition, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143-1152, (2016)
  • [5] HUANG S L, XU Z, TAO D C, Et al., Part-Stacked CNN for Fine-Grained Visual Categorization, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173-1182, (2016)
  • [6] FU J L, ZHENG H L, MEI T., Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition, Proc of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4476-4484, (2017)
  • [7] ZHENG H L, FU J L, MEI T, Et al., Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition, Proc of the IEEE International Conference on Computer Vision, pp. 5219-5227, (2017)
  • [8] HU T, QI H G, HUANG Q M, Et al., See Better before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification
  • [9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale [C/OL]
  • [10] DIAO Q S, JIANG Y, WEN B, Et al., Metaformer: A Unified Meta Framework for Fine-Grained Recognition [ C/OL ]