Learning Scale-Consistent Attention Part Network for Fine-Grained Image Recognition

被引:23
作者
Liu, Huabin [1 ]
Li, Jianguo [2 ]
Li, Dian [3 ]
See, John [4 ]
Lin, Weiyao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Ant Financial Serv Grp, Beijing 101100, Peoples R China
[3] Tencent Technol Beijing Co Ltd, Beijing 100080, Peoples R China
[4] Heriot Watt Univ, Sch Math & Comp Sci, Putrajaya 62200, Malaysia
基金
中国国家自然科学基金;
关键词
Image recognition; Task analysis; Logic gates; Location awareness; Visualization; Training; Object detection; Fine-grained image recognition; scale-consistent; attention part;
D O I
10.1109/TMM.2021.3090274
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Discriminative region localization and feature learning are crucial for fine-grained visual recognition. Existing approaches solve this issue by attention mechanism or part based methods while neglecting consistency between attention and local parts, as well as the rich relation information among parts. This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).
引用
收藏
页码:2902 / 2913
页数:12
相关论文
共 53 条
  • [1] Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
  • [2] Buzzelli M., 2018, IEEE I C CONS ELECT, P1
  • [3] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
    Cao, Yue
    Xu, Jiarui
    Lin, Stephen
    Wei, Fangyun
    Hu, Han
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
  • [4] Relation Attention for Temporal Action Localization
    Chen, Peihao
    Gan, Chuang
    Shen, Guangyao
    Huang, Wenbing
    Zeng, Runhao
    Tan, Mingkui
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2723 - 2733
  • [5] Destruction and Construction Learning for Fine-grained Image Recognition
    Chen, Yue
    Bai, Yalong
    Zhang, Wei
    Mei, Tao
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5152 - 5161
  • [6] Kernel Pooling for Convolutional Neural Networks
    Cui, Yin
    Zhou, Feng
    Wang, Jiang
    Liu, Xiao
    Lin, Yuanqing
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3049 - 3058
  • [7] Detecting Visual Relationships with Deep Relational Networks
    Dai, Bo
    Zhang, Yuqi
    Lin, Dahua
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3298 - 3308
  • [8] Pairwise Confusion for Fine-Grained Visual Classification
    Dubey, Abhimanyu
    Gupta, Otkrist
    Guo, Pei
    Raskar, Ramesh
    Farrell, Ryan
    Naik, Nikhil
    [J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 71 - 88
  • [9] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
    Fu, Jianlong
    Zheng, Heliang
    Mei, Tao
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4476 - 4484
  • [10] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149