SR-GNN: Spatial Relation-Aware Graph Neural Network for Fine-Grained Image Categorization

被引:38
作者
Bera, Asish [1 ]
Wharton, Zachary [2 ]
Liu, Yonghuai [2 ]
Bessis, Nik [2 ]
Behera, Ardhendu [2 ]
机构
[1] Birla Inst Technol & Sci Pilani BITS, Dept Comp Sci & Informat Syst, Pilani Campus, Pilani 333031, Rajasthan, India
[2] Edge Hill Univ, Dept Comp Sci, Ormskirk L39 4QP, Lancs, England
关键词
Feature extraction; Visualization; Proposals; Logic gates; Task analysis; Semantics; Graph neural networks; Attention mechanism; convolutional neural networks; graph neural networks; human action; fine-grained visual recognition; relation-aware feature transformation; ATTENTION;
D O I
10.1109/TIP.2022.3205215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/scene. To this end, we propose a method that effectively captures subtle changes by aggregating context-aware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
引用
收藏
页码:6017 / 6031
页数:15
相关论文
共 85 条
[1]   Parametric Classification of Bingham Distributions Based on Grassmann Manifolds [J].
Ali, Muhammad ;
Gao, Junbin ;
Antolovich, Michael .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) :5771-5784
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Barz B, 2020, IEEE WINT CONF APPL, P1360, DOI 10.1109/WACV45572.2020.9093286
[4]  
Behera A., 2020, IEEE T AFFECT COMPUT, DOI [10.1109/TAFFC.2020.3031841, DOI 10.1109/TAFFC.2020.3031841]
[5]  
Behera A, 2021, AAAI CONF ARTIF INTE, V35, P929
[6]   Attend and Guide (AG-Net): A Keypoints-Driven Attention-Based Deep Network for Image Recognition [J].
Bera, Asish ;
Wharton, Zachary ;
Liu, Yonghuai ;
Bessis, Nik ;
Behera, Ardhendu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :3691-3704
[7]   A Probabilistic Collaborative Representation based Approach for Pattern Classification [J].
Cai, Sijia ;
Zhang, Lei ;
Zuo, Wangmeng ;
Feng, Xiangchu .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2950-2959
[8]   Your "Flamingo" is My "Bird": Fine-Grained, or Not [J].
Chang, Dongliang ;
Pang, Kaiyue ;
Zheng, Yixiao ;
Ma, Zhanyu ;
Song, Yi-Zhe ;
Guo, Jun .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11471-11480
[9]   The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J].
Chang, Dongliang ;
Ding, Yifeng ;
Xie, Jiyang ;
Bhunia, Ayan Kumar ;
Li, Xiaoxu ;
Ma, Zhanyu ;
Wu, Ming ;
Guo, Jun ;
Song, Yi-Zhe .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4683-4695
[10]   Destruction and Construction Learning for Fine-grained Image Recognition [J].
Chen, Yue ;
Bai, Yalong ;
Zhang, Wei ;
Mei, Tao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161