Learning Scale-Consistent Attention Part Network for Fine-Grained Image Recognition

被引：23

作者：

Liu, Huabin ^{[1
]}

Li, Jianguo ^{[2
]}

Li, Dian ^{[3
]}

See, John ^{[4
]}

Lin, Weiyao ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China

[2] Ant Financial Serv Grp, Beijing 101100, Peoples R China

[3] Tencent Technol Beijing Co Ltd, Beijing 100080, Peoples R China

[4] Heriot Watt Univ, Sch Math & Comp Sci, Putrajaya 62200, Malaysia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

中国国家自然科学基金;

关键词：

Image recognition; Task analysis; Logic gates; Location awareness; Visualization; Training; Object detection; Fine-grained image recognition; scale-consistent; attention part;

D O I：

10.1109/TMM.2021.3090274

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Discriminative region localization and feature learning are crucial for fine-grained visual recognition. Existing approaches solve this issue by attention mechanism or part based methods while neglecting consistency between attention and local parts, as well as the rich relation information among parts. This paper proposes a Scale-consistent Attention Part Network (SCAPNet) to address that issue, which seamlessly integrates three novel modules: grid gate attention unit (gGAU), scale-consistent attention part selection (SCAPS), and part relation modeling (PRM). The gGAU module represents the grid region at a certain fine-scale with middle layer CNN features and produces hard attention maps with the lightweight Gumbel-Max based gate. The SCAPS module utilizes attention to guide part selection across multi-scales and keep the selection scale-consistent. The PRM module utilizes the self-attention mechanism to build the relationship among parts based on their appearance and relative geo-positions. SCAPNet can be learned in an end-to-end way and demonstrates state-of-the-art accuracy on several publicly available fine-grained recognition datasets (CUB-200-2011, FGVC-Aircraft, Veg200, and Fru92).

引用

页码：2902 / 2913

页数：12

共 53 条

[1] Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[2] Buzzelli M., 2018, IEEE I C CONS ELECT, P1
[3] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Cao, Yue
Xu, Jiarui
Lin, Stephen
Wei, Fangyun
Hu, Han
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
[4] Relation Attention for Temporal Action Localization
Chen, Peihao
Gan, Chuang
Shen, Guangyao
Huang, Wenbing
Zeng, Runhao
Tan, Mingkui
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2723 - 2733
[5] Destruction and Construction Learning for Fine-grained Image Recognition
Chen, Yue
Bai, Yalong
Zhang, Wei
Mei, Tao
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5152 - 5161
[6] Kernel Pooling for Convolutional Neural Networks
Cui, Yin
Zhou, Feng
Wang, Jiang
Liu, Xiao
Lin, Yuanqing
Belongie, Serge
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3049 - 3058
[7] Detecting Visual Relationships with Deep Relational Networks
Dai, Bo
Zhang, Yuqi
Lin, Dahua
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3298 - 3308
[8] Pairwise Confusion for Fine-Grained Visual Classification
Dubey, Abhimanyu
Gupta, Otkrist
Guo, Pei
Raskar, Ramesh
Farrell, Ryan
Naik, Nikhil
[J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 71 - 88
[9] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
Fu, Jianlong
Zheng, Heliang
Mei, Tao
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4476 - 4484
[10] Dual Attention Network for Scene Segmentation
Fu, Jun
Liu, Jing
Tian, Haijie
Li, Yong
Bao, Yongjun
Fang, Zhiwei
Lu, Hanqing
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149

← 1 2 3 4 5 6 →