A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

被引:1116
作者
Cai, Zhaowei [1 ]
Fan, Quanfu [2 ]
Feris, Rogerio S. [2 ]
Vasconcelos, Nuno [1 ]
机构
[1] Univ Calif San Diego, SVCL, San Diego, CA 92103 USA
[2] IBM TJ Watson Res, Yorktown Hts, NY USA
来源
COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷
基金
美国国家科学基金会;
关键词
Object detection; Multi-scale; Unified neural network;
D O I
10.1007/978-3-319-46493-0_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 43 条
[1]  
[Anonymous], 2014, P 27 INT C NEURAL IN
[2]  
[Anonymous], 2016, CVPR
[3]   Multiscale Combinatorial Grouping [J].
Arbelaez, Pablo ;
Pont-Tuset, Jordi ;
Barron, Jonathan T. ;
Marques, Ferran ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :328-335
[4]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[5]  
Benenson R, 2012, PROC CVPR IEEE, P2903, DOI 10.1109/CVPR.2012.6248017
[6]   Robust object detection via soft cascade [J].
Bourdev, L ;
Brandt, J .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, :236-243
[7]   Learning Complexity-Aware Cascades for Deep Pedestrian Detection [J].
Cai, Zhaowei ;
Saberian, Mohammad ;
Vasconcelos, Nuno .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3361-3369
[8]  
Chen XZ, 2015, ADV NEUR IN, V28
[9]   BING: Binarized Normed Gradients for Objectness Estimation at 300fps [J].
Cheng, Ming-Ming ;
Zhang, Ziming ;
Lin, Wen-Yan ;
Torr, Philip .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3286-3293
[10]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893