HCE: Hierarchical Context Embedding for Region-Based Object Detection

被引:25
作者
Chen, Zhao-Min [1 ]
Jin, Xin [2 ]
Zhao, Bo-Rui [3 ]
Zhang, Xiaoqin [1 ]
Guo, Yanwen [4 ]
机构
[1] Wenzhou Univ, Coll Comp Sci & Artificial Intelligence, Wenzhou 325035, Peoples R China
[2] Samsung Res, Samsung Res Nanjing, Nanjing 210012, Peoples R China
[3] Megvii Technol, Megvii Res Nanjing, Nanjing 210046, Peoples R China
[4] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Detectors; Feature extraction; Proposals; Object detection; Head; Training; Noise measurement; context embedding; region-based CNNs;
D O I
10.1109/TIP.2021.3099733
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art two-stage object detectors apply a classifier to a sparse set of object proposals, relying on region-wise features extracted by RoIPool or RoIAlign as inputs. The region-wise features, in spite of aligning well with the proposal locations, may still lack the crucial context information which is necessary for filtering out noisy background detections, as well as recognizing objects possessing no distinctive appearances. To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues. Specifically, to advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module which leverages the holistic image-level context to learn object-level concepts. Then, novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions, which are also complementary to conventional RoI features. Moreover, to make full use of our hierarchical contextual RoI features, we propose the early-and-late fusion strategies (i.e., feature fusion and confidence fusion), which can be combined to boost the classification accuracy of region-based detectors. Comprehensive experiments demonstrate that our HCE framework is flexible and generalizable, leading to significant and consistent improvements upon various region-based detectors, including FPN, Cascade R-CNN, Mask R-CNN and PA-FPN. With simple modification, our HCE framework can be conveniently adapted to fit the structure of one-stage detectors, and achieve improved performance for SSD, RetinaNet and EfficientDet.
引用
收藏
页码:6917 / 6929
页数:13
相关论文
共 47 条
[1]  
[Anonymous], 2018, P EUR C COMP VIS
[2]  
[Anonymous], 2018, ECCV
[3]   Weakly Supervised Deep Detection Networks [J].
Bilen, Hakan ;
Vedaldi, Andrea .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2846-2854
[4]  
Bilen H, 2015, PROC CVPR IEEE, P1081, DOI 10.1109/CVPR.2015.7298711
[5]  
Chen Kai, 2019, arXiv preprint arXiv:1906.07155
[6]   Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning [J].
Cinbis, Ramazan Gokberk ;
Verbeek, Jakob ;
Schmid, Cordelia .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) :189-203
[7]  
Dai JF, 2016, ADV NEUR IN, V29
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Ebersbach M., 2017, CLEF (Working Notes)