CSA-RCNN: Cascaded Self-Attention Networks for High-Quality 3-D Object Detection From LiDAR Point Clouds

被引:0
作者
Liu, Ajian [1 ]
Yuan, Liang [2 ]
Chen, Juan [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Cultural & Creat Ind, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Proposals; Point cloud compression; Object detection; Accuracy; Laser radar; Detectors; Transformers; Semantics; Aggregates; 3-D object detection; cascade paradigm; point clouds; self-attention; transformer feature fusion;
D O I
10.1109/TIM.2024.3476690
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
LiDAR-based 3-D object detection leverages the precise spatial information provided by point clouds to enhance the understanding of 3-D environments. This approach has garnered significant attention from both industry and academia, particularly in fields such as autonomous driving and robotics. However, how to improve the detection accuracy of long-distance objects remains a critical challenge for existing two-stage methods. This difficulty is primarily attributed to the sparsity and uneven distribution of point clouds, which lead to inconsistent quality in proposals for distant targets. To tackle the challenges, this article proposes a novel and effective 3-D point cloudy detection network based on cascaded self-attention (CSA)-region-based convolutional neural network (RCNN), to achieve higher quality 3-D object detection in traffic scenes. First, to enhance the quality of proposals for long-range objects, we design a cascade self-attention module (CSM) that utilizes a multihead self-attention (MHSA) mechanism across multiple independent cascaded subnetworks to aggregate proposal features at different stages. This approach improves the accuracy of iterative proposal refinement by strengthening the feature modeling across different stages. Second, to enhance the correlation between different representations of the point cloud, we design a transformer-based feature fusion module that fully integrates these multisource features into richer point-wise features. Finally, to remove unnecessary background information from the 3-D scene, we introduce a semantic-guided farthest point sampling (S-FPS) strategy that helps preserve essential foreground points during the downsampling process. Extensive experiments were conducted on the highly competitive KITTI and Waymo datasets, which validated the effectiveness of the proposed method. Notably, CSA-RCNN achieves a +1.01% improvement in average precision (AP) for the car class at the difficult level, compared to the point-voxel (PV)-RCNN on the KITTI validation dataset.
引用
收藏
页数:13
相关论文
共 78 条
  • [1] SP-Det: Leveraging Saliency Prediction for Voxel-Based 3D Object Detection in Sparse Point Cloud
    An, Pei
    Duan, Yucong
    Huang, Yuliang
    Ma, Jie
    Chen, Yanfei
    Wang, Liheng
    Yang, You
    Liu, Qiong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 (2795-2808) : 2795 - 2808
  • [2] 3D Cascade RCNN: High Quality Object Detection in Point Clouds
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Mei, Tao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5706 - 5719
  • [3] YOLOv4-5D: An Effective and Efficient Object Detector for Autonomous Driving
    Cai, Yingfeng
    Luan, Tianyu
    Gao, Hongbo
    Wang, Hai
    Chen, Long
    Li, Yicheng
    Sotelo, Miguel Angel
    Li, Zhixiong
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [4] Cascade R-CNN: High Quality Object Detection and Instance Segmentation
    Cai, Zhaowei
    Vasconcelos, Nuno
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) : 1483 - 1498
  • [5] Accelerating Point-Voxel Representation of 3-D Object Detection for Automatic Driving
    Cao J.
    Tao C.
    Zhang Z.
    Gao Z.
    Luo X.
    Zheng S.
    Zhu Y.
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 254 - 266
  • [6] Chen C, 2022, AAAI CONF ARTIF INTE, P221
  • [7] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [8] Not all points are balanced: Class balanced single-stage outdoor multi-class 3D object detector from point clouds
    Chen, Yidong
    Cai, Guorong
    Xia, Qiming
    Liu, Zhaoliang
    Zeng, Binghui
    Zhang, Zongliang
    Su, Jinhe
    Wang, Zongyue
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
  • [9] Focal Sparse Convolutional Networks for 3D Object Detection
    Chen, Yukang
    Li, Yanwei
    Zhang, Xiangyu
    Sun, Jian
    Jia, Jiaya
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
  • [10] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201