Dynamic Perception Framework for Fine-Grained Recognition

被引：12

作者：

Ding, Yao ^{[1
]}

Han, Zhenjun ^{[1
]}

Zhou, Yanzhao ^{[1
]}

Zhu, Yi ^{[1
]}

Chen, Jie ^{[2
,3
]}

Ye, Qixiang ^{[1
]}

Jiao, Jianbin ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China

[2] Peking Univ, Sch Elect & Comp Engn, Shenzhen 518055, Peoples R China

[3] Pengcheng Lab, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Kernel; Convolution; Feature extraction; Visualization; Image recognition; Task analysis; Radio frequency; Dynamic perception; spatial selective kernel; spatial selective sampling; fine-grained recognition; ATTENTION;

D O I：

10.1109/TCSVT.2021.3069835

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fine-grained recognition poses the challenge of discriminating categories with only small subtle visual differences, which can be easily overwhelmed by diverse appearance within categories. Conventional approaches generally locate discriminative parts and then recognize the part-based features. However, we find that tuning the effective receptive field (ERF) of the network to the task plays the key role, which enables significant regions to contribute more to the output. Inspired by the receptive field stimulation mechanism of the visual cortex, we propose a Dynamic Perception framework as a solution. Our framework adapts the ERF by considering the image space and the kernel space simultaneously. In the image space, the Spatial Selective Sampling module is adopted to enlarge informative regions locally. In the kernel space, Spatial Selective Kernel convolution is introduced to adapt different kernel sizes for regions of interest and backgrounds by embedding spatial attention in the multi-path convolution. Extensive experiments on challenging benchmarks, including CUB-200-2011, FGVC-Aircraft, and Stanford Cars, demonstrate that our method yields a performance boost over the state-of-the-art methods.

引用

页码：1353 / 1365

页数：13

共 62 条

[1]

Chen W.-Y., 2019, INT C LEARN REPR, P1

[2] Destruction and Construction Learning for Fine-grained Image Recognition [J].

Chen, Yue ;

Bai, Yalong ;

Zhang, Wei ;

Mei, Tao .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161

[3] Attention: beyond neural response increases [J].

Connor, Charles E. .

NATURE NEUROSCIENCE, 2006, 9 (09) :1083-1084

[4] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[5] Selective Sparse Sampling for Fine-grained Image Recognition [J].

Ding, Yao ;

Zhou, Yanzhao ;

Zhu, Yi ;

Ye, Qixiang ;

Jiao, Jianbin .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6598-6607

[6] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition [J].

Fu, Jianlong ;

Zheng, Heliang ;

Mei, Tao .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4476-4484

[7]

Gao H., 2020, INT C LEARNING REPRE

[8] Compact Bilinear Pooling [J].

Gao, Yang ;

Beijbom, Oscar ;

Zhang, Ning ;

Darrell, Trevor .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :317-326

[9] Smart Mining for Deep Metric Learning [J].

Harwood, Ben ;

Kumar, Vijay B. G. ;

Carneiro, Gustavo ;

Reid, Ian ;

Drummond, Tom .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2840-2848

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 5 6 7 →