EPDet: Enhancing point clouds features with effective representation for 3D object detection

被引:4
作者
Chen, Yidong [1 ]
Cai, Guorong [1 ]
Xia, Qiming [2 ]
Liu, Zhaoliang [1 ]
Zeng, Binghui [1 ]
Zhang, Zongliang [1 ]
Li, Jonathan [3 ,4 ]
Wang, Zongyue [1 ]
机构
[1] Jimei Univ, Sch Comp Engn, Xiamen 361021, Peoples R China
[2] Xiamen Univ, Sch Informat, 422 Siming South Rd, Xiamen 361005, Fujian, Peoples R China
[3] Univ Waterloo, Dept Geog & Environm Management, Waterloo, ON N2L 3G1, Canada
[4] Univ Waterloo, Dept Syst Design Engn, Waterloo, ON N2L 3G1, Canada
基金
中国国家自然科学基金;
关键词
Point clouds; Feature representation; BEV offset transformer; Focal Conv; Pyramid-like Conv;
D O I
10.1016/j.jag.2024.103688
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Effective 3D object detection relies on strong feature representation. Both global features and representative characteristics of the objects are vital for detection. However, traditional convolution's perception range limits large receptive fields, and current object feature representation has room for enhancement. Addressing these concerns, we introduce a 3D outdoor object detector to enhance the point clouds feature, referred to as EPDet. Specifically, for global features, we propose a BEV-Offset Transformer in the BEV (Bird's Eye View) domain. This adaptable module enhances semantic connections among objects, suiting various 3D detection methods. In addition, to refine point cloud features, we employ Focal Conv as our 3D ( 3-dimensional) backbone, exploring multi -modal fusion effects. In the 2D (2-dimensional) backbone, our Pyramid-like Conv captures detailed contextual features. EPDet performs well in the dense object scene owing to scene-wide global features captured by BEV Offset Transformer. In the multi-class tests, EPDet excels in detecting smaller objects due to more refined features represented by Focal Conv and Pyramid-like Conv. In experiments, as a plug-andplay module, we validate the BEV Offset Transformer's effectiveness across single -stage (SECOND), two-stage (Voxel-RCNN), and multi -stage (CasA) algorithms. Robustness is tested on KITTI, NuScenes, and ONCE datasets. The proposed EPDet, in the KITTI subset, EPDet (CasA-based) achieves an impressive 85.56% accuracy in the car category. EPDet (Voxel-RCNN-based) surpassed baseline 1.65% mAP (mean Average Precision) (moderate subsets) in multi-class detection. The precision of EPDet is on par with the SotA (State of the Art) 3D outdoor object detectors based on point clouds.
引用
收藏
页数:12
相关论文
共 61 条
[1]   TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].
Bai, Xuyang ;
Hu, Zeyu ;
Zhu, Xinge ;
Huang, Qingqiu ;
Chen, Yilun ;
Fu, Hangbo ;
Tai, Chiew-Lan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089
[2]  
Bruna J, 2014, Arxiv, DOI [arXiv:1312.6203, DOI 10.48550/ARXIV.1312.6203]
[3]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[4]  
Cao P, 2019, IEEE IMAGE PROC, P3896, DOI [10.1109/icip.2019.8803572, 10.1109/ICIP.2019.8803572]
[5]   MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection [J].
Chen, Xuesong ;
Shi, Shaoshuai ;
Zhu, Benjin ;
Cheung, Ka Chun ;
Xu, Hang ;
Li, Hongsheng .
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 :680-697
[6]  
Chen YK, 2023, Arxiv, DOI arXiv:2303.11301
[7]  
Chen YK, 2022, Arxiv, DOI arXiv:2206.10555
[8]   Focal Sparse Convolutional Networks for 3D Object Detection [J].
Chen, Yukang ;
Li, Yanwei ;
Zhang, Xiangyu ;
Sun, Jian ;
Jia, Jiaya .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427
[9]  
Chi XW, 2022, Arxiv, DOI arXiv:2212.01231
[10]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201