Your "Flamingo" is My "Bird": Fine-Grained, or Not

被引：82

作者：

Chang, Dongliang ^{[1
]}

Pang, Kaiyue ^{[2
]}

Zheng, Yixiao ^{[1
]}

Ma, Zhanyu ^{[1
]}

Song, Yi-Zhe ^{[2
]}

Guo, Jun ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Pattern Recognit & Intelligent Syst Lab, Beijing, Peoples R China

[2] Univ Surrey, CVSSP, SketchX, Guildford, Surrey, England

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

国家重点研发计划; 北京市自然科学基金; 中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR46437.2021.01131

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore - how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy - so that our answer becomes "bird" double right arrow "Phoenicopteriformes" double right arrow "Phoenicopteridae"double right arrow"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on the traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.

引用

页码：11471 / 11480

页数：10

共 59 条

[1]

[Anonymous], 2019, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

[2]

[Anonymous], 2014, Bird species categorization using pose normalized deep convolutional nets

[3]

[Anonymous], 2012, CVPR

[4]

[Anonymous], 2019, CVPR

[5]

[Anonymous], 2018, WACV, DOI DOI 10.1109/WACV.2018.00145

[6]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00189

[7]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00316

[8]

[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.320

[9]

[Anonymous], 2018, P EUROPEAN C COMPUTE

[10]

[Anonymous], 2018, ICML

← 1 2 3 4 5 6 →