RAFDet: Range View Augmented Fusion Network for Point-Based 3D Object Detection

被引:0
作者
Zheng, Zhijie [1 ]
Huang, Zhicong [1 ]
Zhao, Jingwen [1 ]
Lin, Kang [1 ]
Hu, Haifeng [1 ]
Chen, Dihu [2 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 511400, Peoples R China
[2] Sun Yat Sen Univ, Sch Integrated Circuits, Guangzhou 518107, Peoples R China
关键词
Feature extraction; Point cloud compression; Three-dimensional displays; Object detection; Laser radar; Convolution; Transformers; Semantics; Robustness; Representation learning; 3D object detection; LiDAR; range view fusion; transformer; R-CNN; VOXELNET;
D O I
10.1109/TMM.2025.3535289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, point-based methods have achieved promising performance on 3D object detection task. Although effective, they still suffer from the inherent sparsity of point cloud, which makes it challenging to distinguish objects with backgrounds only relying on the view of raw point. To this end, we propose a straightforward yet effective multi-view fusion network termed RAFDet to alleviate this issue. The core idea of our method lies in combining the merits of raw point and its range view to enhance the representation learning for sparse point cloud, thus mitigating the sparsity problem and boosting the detection performance. In particular, we introduce a novel bidirectional attentive fusion module to equip sparse point with interacted fine-grained semantic clues during feature learning process. Then, we devise the range-view augmented fusion module to fully exploit the supplementary relationship between different perspectives with the aim of enhancing original point-view features. In the end, a single-stage detection head is utilized to predict final 3D bounding boxes based on the enhanced semantics. We have evaluated our method on the popular KITTI Dataset, DAIR-V2X Dataset and Waymo Open Dataset. Experimental results on the above three datasets demonstrate the effectiveness and robustness of our approach in terms of detection performance and model complexity.
引用
收藏
页码:4167 / 4180
页数:14
相关论文
共 64 条
[1]  
Chen C, 2022, AAAI CONF ARTIF INTE, P221
[2]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[3]   VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [J].
Chen, Yukang ;
Liu, Jianhui ;
Zhang, Xiangyu ;
Qi, Xiaojuan ;
Jia, Jiaya .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :21674-21683
[4]  
Cortinhal T., 2020, P INT S VIS COMP, P1473
[5]   From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection [J].
Deng, Jiajun ;
Zhou, Wengang ;
Zhang, Yanyong ;
Li, Houqiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) :4722-4734
[6]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[7]   RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection [J].
Fan, Lue ;
Xiong, Xuan ;
Wang, Feng ;
Wang, Naiyan ;
Zhang, Zhaoxiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2898-2907
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]   Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [J].
He, Chenhang ;
Li, Ruihuang ;
Li, Shuai ;
Zhang, Lei .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8407-8417
[10]   DVFENet: Dual-branch voxel feature extraction network for 3D object detection [J].
He, Yunqian ;
Xia, Guihua ;
Luo, Yongkang ;
Su, Li ;
Zhang, Zhi ;
Li, Wanyi ;
Wang, Peng .
NEUROCOMPUTING, 2021, 459 :201-211