Mask-guided SSD for small-object detection

被引:49
作者
Sun, Chang [1 ]
Ai, Yibo [1 ]
Wang, Sheng [2 ]
Zhang, Weidong [1 ]
机构
[1] Univ Sci & Technol Beijing, Natl Ctr Mat Serv Safety, Beijing, Peoples R China
[2] UCAR, AI Lab, 118 East Zhongguancun Rd, Beijing, Peoples R China
关键词
Deep learning; Neural network; Object detection; Atrous convolution; Feature fusion; PEDESTRIAN DETECTION;
D O I
10.1007/s10489-020-01949-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting small objects is a challenging job for the single-shot multibox detector (SSD) model due to the limited information contained in features and complex background interference. Here, we increased the performance of the SSD for detecting target objects with small size by enhancing detection features with contextual information and introducing a segmentation mask to eliminate background regions. The proposed model is referred to as a "guided SSD" (Mask-SSD) and includes two branches: a detection branch and a segmentation branch. We created a feature-fusion module to allow the detection branch to exploit contextual information for feature maps with large resolution, with the segmentation branch primarily built with atrous convolution to provide additional contextual information to the detection branch. The input of the segmentation branch was also the output of the detection branch, and output segmentation features were fused with detection features in order to classify and locate target objects. Additionally, segmentation features were applied to generate the mask, which was utilized to guide the detection branch to find objects in potential foreground regions. Evaluation of Mask-SSD on the Tsinghua-Tencent 100K and Caltech pedestrian datasets demonstrated its effectiveness at detecting small objects and comparable performance relative to other state-of-the-art methods.
引用
收藏
页码:3311 / 3322
页数:12
相关论文
共 53 条
[1]  
Ashraf K, 2016, ARXIV160601561
[2]   SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network [J].
Bai, Yancheng ;
Zhang, Yongqiang ;
Ding, Mingli ;
Ghanem, Bernard .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :210-226
[3]   Pedestrian Detection with Autoregressive Network Phases [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7224-7233
[4]   A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].
Cai, Zhaowei ;
Fan, Quanfu ;
Feris, Rogerio S. ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370
[5]   Feature-Fused SSD: Fast Detection for Small Objects [J].
Cao, Guimei ;
Xie, Xuemei ;
Yang, Wenzhe ;
Liao, Quan ;
Shi, Guangming ;
Wu, Jinjian .
NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]   Beyond triplet loss: a deep quadruplet network for person re-identification [J].
Chen, Weihua ;
Chen, Xiaotang ;
Zhang, Jianguo ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1320-1329
[8]  
Cheng B., 2018, ARXIV181004002
[9]  
Dai JF, 2016, ADV NEUR IN, V29
[10]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761