CSA-RCNN: Cascaded Self-Attention Networks for High-Quality 3-D Object Detection From LiDAR Point Clouds

被引：0

作者：

Liu, Ajian ^{[1
]}

Yuan, Liang ^{[2
]}

Chen, Juan ^{[1
]}

机构：

[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China

[2] Shanghai Jiao Tong Univ, Inst Cultural & Creat Ind, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2024年 / 73卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Proposals; Point cloud compression; Object detection; Accuracy; Laser radar; Detectors; Transformers; Semantics; Aggregates; 3-D object detection; cascade paradigm; point clouds; self-attention; transformer feature fusion;

D O I：

10.1109/TIM.2024.3476690

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

LiDAR-based 3-D object detection leverages the precise spatial information provided by point clouds to enhance the understanding of 3-D environments. This approach has garnered significant attention from both industry and academia, particularly in fields such as autonomous driving and robotics. However, how to improve the detection accuracy of long-distance objects remains a critical challenge for existing two-stage methods. This difficulty is primarily attributed to the sparsity and uneven distribution of point clouds, which lead to inconsistent quality in proposals for distant targets. To tackle the challenges, this article proposes a novel and effective 3-D point cloudy detection network based on cascaded self-attention (CSA)-region-based convolutional neural network (RCNN), to achieve higher quality 3-D object detection in traffic scenes. First, to enhance the quality of proposals for long-range objects, we design a cascade self-attention module (CSM) that utilizes a multihead self-attention (MHSA) mechanism across multiple independent cascaded subnetworks to aggregate proposal features at different stages. This approach improves the accuracy of iterative proposal refinement by strengthening the feature modeling across different stages. Second, to enhance the correlation between different representations of the point cloud, we design a transformer-based feature fusion module that fully integrates these multisource features into richer point-wise features. Finally, to remove unnecessary background information from the 3-D scene, we introduce a semantic-guided farthest point sampling (S-FPS) strategy that helps preserve essential foreground points during the downsampling process. Extensive experiments were conducted on the highly competitive KITTI and Waymo datasets, which validated the effectiveness of the proposed method. Notably, CSA-RCNN achieves a +1.01% improvement in average precision (AP) for the car class at the difficult level, compared to the point-voxel (PV)-RCNN on the KITTI validation dataset.

引用

页数：13

共 78 条

[1] SP-Det: Leveraging Saliency Prediction for Voxel-Based 3D Object Detection in Sparse Point Cloud
An, Pei
Duan, Yucong
Huang, Yuliang
Ma, Jie
Chen, Yanfei
Wang, Liheng
Yang, You
Liu, Qiong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 (2795-2808) : 2795 - 2808
[2] 3D Cascade RCNN: High Quality Object Detection in Point Clouds
Cai, Qi
Pan, Yingwei
Yao, Ting
Mei, Tao
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5706 - 5719
[3] YOLOv4-5D: An Effective and Efficient Object Detector for Autonomous Driving
Cai, Yingfeng
Luan, Tianyu
Gao, Hongbo
Wang, Hai
Chen, Long
Li, Yicheng
Sotelo, Miguel Angel
Li, Zhixiong
[J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
[4] Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Cai, Zhaowei
Vasconcelos, Nuno
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) : 1483 - 1498
[5] Accelerating Point-Voxel Representation of 3-D Object Detection for Automatic Driving
Cao J.
Tao C.
Zhang Z.
Gao Z.
Luo X.
Zheng S.
Zhu Y.
[J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 254 - 266
[6] Chen C, 2022, AAAI CONF ARTIF INTE, P221
[7] Multi-View 3D Object Detection Network for Autonomous Driving
Chen, Xiaozhi
Ma, Huimin
Wan, Ji
Li, Bo
Xia, Tian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
[8] Not all points are balanced: Class balanced single-stage outdoor multi-class 3D object detector from point clouds
Chen, Yidong
Cai, Guorong
Xia, Qiming
Liu, Zhaoliang
Zeng, Binghui
Zhang, Zongliang
Su, Jinhe
Wang, Zongyue
[J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
[9] Focal Sparse Convolutional Networks for 3D Object Detection
Chen, Yukang
Li, Yanwei
Zhang, Xiangyu
Sun, Jian
Jia, Jiaya
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
[10] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

← 1 2 3 4 5 6 7 8 →