Your "Flamingo" is My "Bird": Fine-Grained, or Not

被引:82
作者
Chang, Dongliang [1 ]
Pang, Kaiyue [2 ]
Zheng, Yixiao [1 ]
Ma, Zhanyu [1 ]
Song, Yi-Zhe [2 ]
Guo, Jun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Pattern Recognit & Intelligent Syst Lab, Beijing, Peoples R China
[2] Univ Surrey, CVSSP, SketchX, Guildford, Surrey, England
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
国家重点研发计划; 北京市自然科学基金; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.01131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore - how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy - so that our answer becomes "bird" double right arrow "Phoenicopteriformes" double right arrow "Phoenicopteridae"double right arrow"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on the traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.
引用
收藏
页码:11471 / 11480
页数:10
相关论文
共 59 条
[41]  
Sinha A., 2018, ARXIV180608028
[42]  
Sun Guolei, 2020, AAAI
[43]   Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition [J].
Sun, Ming ;
Yuan, Yuchen ;
Zhou, Feng ;
Ding, Errui .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :834-850
[44]   A tutorial on pilot studies: the what, why and how [J].
Thabane, Lehana ;
Ma, Jinhui ;
Chu, Rong ;
Cheng, Ji ;
Ismaila, Afisi ;
Rios, Lorena P. ;
Robson, Reid ;
Thabane, Marroon ;
Giangregorio, Lora ;
Goldsmith, Charles H. .
BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
[45]  
Tschannen Michael, 2020, CVPR
[46]   Automatic Exam Grading by a Mobile Camera Snap a Picture to Grade Your Tests [J].
Wagstaff, Benjamin ;
Lu, Chiao ;
Chen, Xiang 'Anthony' .
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES: COMPANION (IUI 2019), 2019, :3-4
[47]  
Wah C., 2011, CALTECH UCSD BIRDS 2
[48]   A Hydrological Sensor Web Ontology Based on the SSN Ontology: A Case Study for a Flood [J].
Wang, Chao ;
Chen, Zeqiang ;
Chen, Nengcheng ;
Wang, Wei .
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (01)
[49]   A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing [J].
Wang, Lei ;
Liu, Kaiyuan ;
Li, Sujun ;
Tang, Haixu .
PROTEOMICS, 2020, 20 (21-22)
[50]   Deep Fuzzy Tree for Large-Scale Hierarchical Visual Classification [J].
Wang, Yu ;
Hu, Qinghua ;
Zhu, Pengfei ;
Li, Linhao ;
Lu, Bingxu ;
Garibaldi, Jonathan M. ;
Li, Xianling .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (07) :1395-1406