Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

被引:2
作者
Uzair, Muhammad [1 ]
Dong, Jian [2 ]
Shi, Ronghua [2 ]
Mushtaq, Husnain [1 ]
Ullah, Irshad [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[2] Cent South Univ, Sch Elect Informat, Changsha 410083, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Feature extraction; Three-dimensional displays; Laser radar; Point cloud compression; Object detection; Semantics; Cameras; Object recognition; Convolutional neural networks; Accuracy; 3-D object detection; class-based point sampling; multimodal fusion; self-attention; semantic feature learning;
D O I
10.1109/TGRS.2024.3476072
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Accurate 3-D object detection is vital in autonomous driving. Traditional LiDAR models struggle with sparse point clouds. We propose a novel approach integrating LiDAR and camera data to maximize sensor strengths while overcoming individual limitations for enhanced 3-D object detection. Our research introduces the channelwise and spatially guided multimodal feature fusion network (CSMNET) for 3-D object detection. First, our method enhances LiDAR data by projecting it onto a 2-D plane, enabling the extraction of class-specific features from a probability map. Second, we design class-based farthest point sampling (C-FPS), which boosts the selection of foreground points by utilizing point weights based on geometric or probability features while ensuring diversity among the selected points. Third, we developed a parallel attention (PAT)-based multimodal fusion mechanism achieving higher resolution compared to raw LiDAR points. This fusion mechanism integrates two attention mechanisms: channel attention for LiDAR data and spatial attention for camera data. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. Specifically, CSMNET achieves an average precision (AP) in bird's eye view (BEV) detection of 90.16% (easy), 85.18% (moderate), and 80.51% (hard), with a mean AP (mAP) of 85.12%. In 3-D detection, CSMNET attains 82.05% (easy), 72.64% (moderate), and 67.10% (hard) with an mAP of 73.75%. For 2-D detection, the scores are 95.47% (easy), 93.25% (moderate), and 86.68% (hard), yielding an mAP of 91.72% for the KITTI dataset.
引用
收藏
页数:15
相关论文
共 71 条
  • [1] Pointwise Convolutional Neural Networks
    Binh-Son Hua
    Minh-Khoi Tran
    Yeung, Sai-Kit
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 984 - 993
  • [2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [3] Background-Aware 3-D Point Cloud Segmentation With Dynamic Point Feature Aggregation
    Chen, Jiajing
    Kakillioglu, Burak
    Velipasalar, Senem
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [5] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [6] Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/ICCV.2019.00987, 10.1109/iccv.2019.00987]
  • [7] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
  • [8] Dosovitskiy A., 2021, P INT C LEARN REPR V, DOI DOI 10.48550/ARXIV.2010.11929
  • [9] Point Transformer
    Engel, Nico
    Belagiannis, Vasileios
    Dietmayer, Klaus
    [J]. IEEE ACCESS, 2021, 9 : 134826 - 134840
  • [10] Fei J., 2020, SemanticVox- els. Sequential Fusion for 3D Pedestrian Detection Using LiDAR Point Cloud and Semantic Segmentation, P185