3D Sensor Based Pedestrian Detection by Integrating Improved HHA Encoding and Two-Branch Feature Fusion

被引:12
作者
Tan, Fang [1 ]
Xia, Zhaoqiang [1 ]
Ma, Yupeng [1 ]
Feng, Xiaoyi [1 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710060, Peoples R China
关键词
3D sensor; multi-modal data; pedestrian detection; HHA; feature fusion; PEOPLE;
D O I
10.3390/rs14030645
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Pedestrian detection is vitally important in many computer vision tasks but still suffers from some problems, such as illumination and occlusion if only the RGB image is exploited, especially in outdoor and long-range scenes. Combining RGB with depth information acquired by 3D sensors may effectively alleviate these problems. Therefore, how to utilize depth information and how to fuse RGB and depth features are the focus of the task of RGB-D pedestrian detection. This paper first improves the most commonly used HHA method for depth encoding by optimizing the gravity direction extraction and depth values mapping, which can generate a pseudo-color image from the depth information. Then, a two-branch feature fusion extraction module (TFFEM) is proposed to obtain the local and global features of both modalities. Based on TFFEM, an RGB-D pedestrian detection network is designed to locate the people. In experiments, the improved HHA encoding method is twice as fast and achieves more accurate gravity-direction extraction on four publicly-available datasets. The pedestrian detection performance of the proposed network is validated on KITTI and EPFL datasets and achieves state-of-the-art performance. Moreover, the proposed method achieved third ranking among all published works on the KITTI leaderboard. In general, the proposed method effectively fuses RGB and depth features and overcomes the effects of illumination and occlusion problems in pedestrian detection.
引用
收藏
页数:19
相关论文
共 70 条
[1]  
[Anonymous], 2011, Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, DOI DOI 10.1109/CVPRW.2011.5981811
[2]   Mask R-CNN [J].
He, Kaiming ;
Gkioxari, Georgia ;
Dollar, Piotr ;
Girshick, Ross .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2980-2988
[3]  
Bagautdinov T, 2015, PROC CVPR IEEE, P2829, DOI 10.1109/CVPR.2015.7298900
[4]  
Braun M., 2018, 180507193 ARXIV
[5]  
Cao J. etal, 2021, P IEEE CVF INT C COM, P7088
[6]  
Chen K., 2019, ARXIV190607155
[7]   3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhu, Yukun ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) :1259-1272
[8]   Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features [J].
Costea, Arthur Daniel ;
Varga, Robert ;
Nedevschi, Sergiu .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :993-1002
[9]  
Eitel A, 2015, IEEE INT C INT ROBOT, P681, DOI 10.1109/IROS.2015.7353446
[10]   FII-CenterNet: An Anchor-Free Detector With Foreground Attention for Traffic Object Detection [J].
Fan, Siqi ;
Zhu, Fenghua ;
Chen, Shichao ;
Zhang, Hui ;
Tian, Bin ;
Lv, Yisheng ;
Wang, Fei-Yue .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (01) :121-132