Improving Single Shot Object Detection With Feature Scale Unmixing

被引:20
作者
Li, Yazhao [1 ]
Pang, Yanwei [1 ]
Cao, Jiale [1 ]
Shen, Jianbing [2 ]
Shao, Ling [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin Key Lab Brain Inspired Intelligence Techn, Tianjin 300071, Peoples R China
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金;
关键词
Feature extraction; Detectors; Object detection; Visualization; Semantics; Sports; Real-time systems; scale unmixing; scale-aware features; single-shot detector; feature erasing;
D O I
10.1109/TIP.2020.3048630
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the advantages of real-time detection and improved performance, single-shot detectors have gained great attention recently. To solve the complex scale variations, single-shot detectors make scale-aware predictions based on multiple pyramid layers. Typically, small objects are detected on shallow layers while large objects are detected on deep layers. However, the features in the pyramid are not scale-aware enough, which limits the detection performance. Two common problems in single-shot detectors caused by object scale variations can be observed: (1) false negative problem, i.e., small objects are easily missed due to the weak features; (2) part-false positive problem, i.e., the salient part of a large object is sometimes detected as an object. With this observation, a new Neighbor Erasing and Transferring (NET) mechanism is proposed for feature scale-unmixing to explore scale-aware features in this paper. In NET, a Neighbor Erasing Module (NEM) is designed to erase the salient features of large objects and emphasize the features of small objects in shallow layers. A Neighbor Transferring Module (NTM) is introduced to transfer the erased features and highlight large objects in deep layers. With this mechanism, a single-shot network called NETNet is constructed for scale-aware object detection. In addition, we propose to aggregate nearest neighboring pyramid features to enhance our NET. Experiments on MS COCO dataset and UAVDT dataset demonstrate the effectiveness of our method. NETNet obtains 38.5% AP at a speed of 27 FPS and 32.0% AP at a speed of 55 FPS on MS COCO dataset. As a result, NETNet achieves a better trade-off for real-time and accurate object detection.
引用
收藏
页码:2708 / 2721
页数:14
相关论文
共 79 条
[1]  
[Anonymous], 2017, FSSD: Feature Fusion Single Shot Multibox Detector, DOI [DOI 10.2514/6.2017-4530, 10.2514/6.2017-4530]
[2]  
[Anonymous], 2017, Hash code archive-drone delivery
[3]  
[Anonymous], 2018, NEURIPS
[4]  
[Anonymous], 2018, P EUR C COMP VIS, DOI DOI 10.1007/978-3-030-01228-1_11
[5]  
[Anonymous], 2016, P COMPUTER VISION EC, DOI DOI 10.1007/978-3-319-46448-0_2
[6]   Hierarchical Shot Detector [J].
Cao, Jiale ;
Pang, Yanwei ;
Han, Jungong ;
Li, Xuelong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9704-9713
[7]   Triply Supervised Decoder Networks for Joint Detection and Segmentation [J].
Cao, Jiale ;
Pang, Yanwei ;
Li, Xuelong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7384-7393
[8]   AP-Loss for Accurate One-Stage Object Detection [J].
Chen, Kean ;
Lin, Weiyao ;
Li, Jianguo ;
See, John ;
Wang, Ji ;
Zou, Junni .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) :3782-3798
[9]   Reverse Attention for Salient Object Detection [J].
Chen, Shuhan ;
Tan, Xiuli ;
Wang, Ben ;
Hu, Xuelong .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252
[10]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773