Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians

被引:41
作者
Cao, Jiale [1 ]
Pang, Yanwei [1 ]
Han, Jungong [2 ]
Gao, Bolin [3 ]
Li, Xuelong [4 ,5 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin Key Lab Brain Inspired Intelligence Techn, Tianjin 300072, Peoples R China
[2] Univ Warwick, Data Sci Grp, Coventry CV4 7AL, W Midlands, England
[3] China Automot Technol & Res Ctr Co Ltd, Tianjin 300300, Peoples R China
[4] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[5] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Small-scale pedestrians; occluded pedestrians; location bootstrap; semantic transition;
D O I
10.1109/TIP.2019.2957927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Small-scale pedestrian detection and occluded pedestrian detection are two challenging tasks. However, most state-of-the-art methods merely handle one single task each time, thus giving rise to relatively poor performance when the two tasks, in practice, are required simultaneously. In this paper, it is found that small-scale pedestrian detection and occluded pedestrian detection actually have a common problem, i.e., an inaccurate location problem. Therefore, solving this problem enables to improve the performance of both tasks. To this end, we pay more attention to the predicted bounding box with worse location precision and extract more contextual information around objects, where two modules (i.e., location bootstrap and semantic transition) are proposed. The location bootstrap is used to reweight regression loss, where the loss of the predicted bounding box far from the corresponding ground-truth is upweighted and the loss of the predicted bounding box near the corresponding ground-truth is downweighted. Additionally, the semantic transition adds more contextual information and relieves semantic inconsistency of the skip-layer fusion. Since the location bootstrap is not used at the test stage and the semantic transition is lightweight, the proposed method does not add many extra computational costs during inference. Experiments on the challenging CityPersons and Caltech datasets show that the proposed method outperforms the state-of-the-art methods on the small-scale pedestrians and occluded pedestrians (e.g., 5.20 and 4.73 improvements on the Caltech).
引用
收藏
页码:3143 / 3152
页数:10
相关论文
共 62 条
[1]  
[Anonymous], 2017, COMPUT RES REPOS
[2]  
[Anonymous], P EUR C COMPUT VIS
[3]  
[Anonymous], P BRIT MACH VIS C
[4]  
[Anonymous], P EUR C COMPUT VIS
[5]   Illuminating Pedestrians via Simultaneous Detection & Segmentation [J].
Brazil, Garrick ;
Yin, Xi ;
Liu, Xiaoming .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4960-4969
[6]   A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].
Cai, Zhaowei ;
Fan, Quanfu ;
Feris, Rogerio S. ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370
[7]   Learning Multilayer Channel Features for Pedestrian Detection [J].
Cao, Jiale ;
Pang, Yanwei ;
Li, Xuelong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3210-3220
[8]   Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry [J].
Cao, Jiale ;
Pang, Yanwei ;
Li, Xuelong .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1316-1324
[9]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[10]   Learning to Segment Object Candidates via Recursive Neural Networks [J].
Chen, Tianshui ;
Lin, Liang ;
Wu, Xian ;
Xiao, Nong ;
Luo, Xiaonan .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (12) :5827-5839