Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method

被引:6
作者
Zhang, Lu [1 ]
Wang, Yang [2 ]
Zhou, Jiaogen [3 ]
Zhang, Chenbo [1 ]
Zhang, Yinglu [1 ]
Guan, Jihong [2 ]
Bian, Yatao [4 ]
Zhou, Shuigeng [1 ]
机构
[1] Fudan Univ, Shanghai, Peoples R China
[2] Tongji Univ, Shanghai, Peoples R China
[3] Huaiyin Normal Univ, Sch Urban & Environm Sci, Huaian, Peoples R China
[4] Tencent AI Lab, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Few-shot object detection; hierarchical few-shot object detection; Benchmark; hierarchical classification;
D O I
10.1145/3503161.3548412
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Few-shot object detection (FSOD) is to detect objects with a few examples. However, existing FSOD methods do not consider hierarchical fine-grained category structures of objects that exist widely in real life. For example, animals are taxonomically classified into orders, families, genera and species etc. In this paper, we propose and solve a new problem called hierarchical few-shot object detection (Hi-FSOD), which aims to detect objects with hierarchical categories in the FSOD paradigm. To this end, on the one hand, we build the first large-scale and high-quality Hi-FSOD benchmark dataset HiFSOD-Bird, which contains 176,350 wild-bird images falling to 1,432 categories. All the categories are organized into a 4-level taxonomy, consisting of 32 orders, 132 families, 572 genera and 1,432 species. On the other hand, we propose the first Hi-FSOD method HiCLPL, where a hierarchical contrastive learning approach is developed to constrain the feature space so that the feature distribution of objects is consistent with the hierarchical taxonomy and the model's generalization power is strengthened. Meanwhile, a probabilistic loss is designed to enable the child nodes to correct the classification errors of their parent nodes in the taxonomy. Extensive experiments on the benchmark dataset HiFSOD-Bird show that our method HiCLPL outperforms the existing FSOD methods.
引用
收藏
页码:2002 / 2011
页数:10
相关论文
共 45 条
[1]   Hierarchy-based Image Embeddings for Semantic Image Retrieval [J].
Barz, Bjoern ;
Denzler, Joachim .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :638-647
[2]  
Bengio Samy, 2010, Advances in neural information processing systems, V23
[3]  
Bertinetto L, 2020, PROC CVPR IEEE, P12503, DOI 10.1109/CVPR42600.2020.01252
[4]  
Deng Jia, 2011, Advances in Neural Information Processing Systems, V24
[5]  
Dubey A, 2018, 32 C NEURAL INFORM P, V31
[6]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[7]  
Fan Qi, 2020, P IEEE CVF C COMP VI
[8]   Generalized Few-Shot Object Detection without Forgetting [J].
Fan, Zhibo ;
Ma, Yuchen ;
Li, Zeming ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4525-4534
[9]  
Frome A., 2013, Advances in neural information processing systems, V26
[10]   Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition [J].
Fu, Jianlong ;
Zheng, Heliang ;
Mei, Tao .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4476-4484