Hierarchical Context Embedding for Region-Based Object Detection

被引：23

作者：

Chen, Zhao-Min ^{[1
,2
]}

Jin, Xin ^{[2
]}

Zhao, Borui ^{[2
]}

Wei, Xiu-Shen ^{[2
]}

Guo, Yanwen ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Megvii Res Nanjing, Megvii Technol, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2020, PT XXI | 2020年 / 12366卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Context embedding; Region-based CNNs;

D O I：

10.1007/978-3-030-58589-1_38

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art two-stage object detectors apply a classifier to a sparse set of object proposals, relying on region-wise features extracted by RoIPool or RoIAlign as inputs. The region-wise features, in spite of aligning well with the proposal locations, may still lack the crucial context information which is necessary for filtering out noisy background detections, as well as recognizing objects possessing no distinctive appearances. To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues. Specifically, to advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module which leverages the holistic image-level context to learn object-level concepts. Then, novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions, which are also complementary to conventional RoI features. Moreover, to make full use of our hierarchical contextual RoI features, we propose the early- and-late fusion strategies (i.e., feature fusion and confidence fusion), which can be combined to boost the classification accuracy of regionbased detectors. Comprehensive experiments demonstrate that our HCE framework is flexible and generalizable, leading to significant and consistent improvements upon various region-based detectors, including FPN, Cascade R-CNN and Mask R-CNN.

引用

页码：633 / 648

页数：16

共 40 条

[1] Weakly Supervised Deep Detection Networks [J].

Bilen, Hakan ;

Vedaldi, Andrea .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854

[2]

Bilen H, 2015, PROC CVPR IEEE, P1081, DOI 10.1109/CVPR.2015.7298711

[3] ContextVP: Fully Context-Aware Video Prediction [J].

Byeon, Wonmin ;

Wang, Qin ;

Srivastava, Rupesh Kumar ;

Koumoutsakos, Petros .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :781-797

[4] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[5]

Chen K, 2019, Arxiv, DOI arXiv:1906.07155

[6] Context Refinement for Object Detection [J].

Chen, Zhe ;

Huang, Shaoli ;

Tao, Dacheng .

COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 :74-89

[7] Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning [J].

Cinbis, Ramazan Gokberk ;

Verbeek, Jakob ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) :189-203

[8]

Dai JF, 2016, ADV NEUR IN, V29

[9] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[10]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 →