An enhanced SSD with feature fusion and visual reasoning for object detection

被引:52
作者
Leng, Jiaxu [1 ]
Liu, Ying [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp & Control Engn, Campus Yanqi, Beijing 101400, Peoples R China
关键词
Object detection; Feature fusion; Visual reasoning; Feature maps; CONVOLUTIONAL NETWORKS;
D O I
10.1007/s00521-018-3486-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single Shot Multibox Detector (SSD) is one of the top performing object detection algorithms in terms of both accuracy and speed. SSD achieves impressive performance on various datasets by using different output layers for object detection. However, each layer in the feature pyramid is used independently, and SSD considers only the fine-grained details of the objects but ignores the context surrounding objects. In this paper, we proposed an enhanced SSD, called ESSD, that improved the performance of the conventional SSD by fusing feature maps of different output layers, instead of growing layers close to the input data. Our method used two-way transfer of feature information and feature fusion to enhance the network. To assist further with object detection, we proposed a visual reasoning method that utilized fully the relationships between objects instead of using only the features of the objects themselves. This addition of visual reasoning proved very effective for detecting objects that are too small or have small features. To evaluate the proposed ESSD, we trained the model with VOC2007 and VOC2012 training sets and evaluated the performance on the Pascal VOC2007 test set. For 300 x 300 input, ESSD achieved 79.2% mean average precision (mAP) at 52.0 frames per second (FPS), and for 512 x 512 input, this approach achieved 82.4% mAP at 18.6 FPS. These results demonstrated that our proposed method can achieve state-of-the-art mAP, which is a better result than provided by the conventional SSD and other advanced detectors.
引用
收藏
页码:6549 / 6558
页数:10
相关论文
共 25 条
  • [1] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
    Bell, Sean
    Zitnick, C. Lawrence
    Bala, Kavita
    Girshick, Ross
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2874 - 2883
  • [2] Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036
  • [3] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [4] Scalable Object Detection using Deep Neural Networks
    Erhan, Dumitru
    Szegedy, Christian
    Toshev, Alexander
    Anguelov, Dragomir
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2155 - 2162
  • [5] The PASCAL Visual Object Classes Challenge: A Retrospective
    Everingham, Mark
    Eslami, S. M. Ali
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136
  • [6] Fu C., 2017, ARXIV, P1
  • [7] Fukui A., 2016, P EMNLP, P457, DOI DOI 10.18653/V1/D16-1044
  • [8] Compact Bilinear Pooling
    Gao, Yang
    Beijbom, Oscar
    Zhang, Ning
    Darrell, Trevor
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 317 - 326
  • [9] Girshick R., 2014, IEEE COMP SOC C COMP, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81]
  • [10] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448