Multi-Grained Deep Feature Learning for Robust Pedestrian Detection

被引：26

作者：

Lin, Chunze ^{[1
,2
]}

Lu, Jiwen ^{[1
,2
]}

Zhou, Jie ^{[1
,2
]}

机构：

[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Dept Automat, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2019年 / 29卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Pedestrian detection; human parsing; attention; deep feature learning; OCCLUSION; FACE;

D O I：

10.1109/TCSVT.2018.2883558

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we address the challenging problem of detecting pedestrians, which are heavily occluded and/or far from cameras. Unlike most existing pedestrian detection methods which only use coarse-resolution feature maps with fixed receptive fields, our approach exploits multi-grained deep features to make the detector robust to visible parts of occluded pedestrians and small-size targets. Specifically, we jointly train a multi-scale network and a human parsing network in a weakly supervised manner with only bounding box annotations. We carefully design the multi-scale network to predict pedestrians of particular scales with the most appropriate feature maps, by matching their receptive fields with the target sizes. The human parsing network generates a fine-grained attention map, which helps guide the detector to focus on the visible parts of occluded pedestrians and small-size instances. Both networks are computed in parallel and form a unified single stage pedestrian detector, which assures a suitable tradeoff between accuracy and speed. Moreover, we introduce an adversarial hiding network to make our detector more robust to occlusion situations, which generates occlusions on pedestrians with the goal to fool the detector that in turn adapts itself to learn to localize these adversarial instances. Experiments on three challenging pedestrian detection benchmarks show that our proposed method achieves a state-of-the-art performance and executes 2 x faster than the competitive methods.

引用

页码：3608 / 3621

页数：14

共 73 条

[1]

[Anonymous], 2015, BANC

[2]

[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.141

[3]

[Anonymous], P 3 INT C LEARNING R

[4]

[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.322

[5]

[Anonymous], 2017, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2017.632

[6]

[Anonymous], 2009, Integral channel featuresJ

[7]

[Anonymous], ADV NEURAL INFORM PR, DOI DOI 10.1109/TPAMI.2016.2577031

[8]

[Anonymous], P EUR C COMPUT VIS

[9]

[Anonymous], 2010, P BMVC

[10]

[Anonymous], 2018, IEEE T PATTERN ANAL, DOI [DOI 10.1109/TPAMI.2017.2737538, 10.1109/TPAMI.2017.2737538]

← 1 2 3 4 5 6 7 8 →