Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

被引：6

作者：

Uzair, Muhammad ^{[1
]}

Dong, Jian ^{[2
]}

Shi, Ronghua ^{[2
]}

Mushtaq, Husnain ^{[1
]}

Ullah, Irshad ^{[1
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China

[2] Cent South Univ, Sch Elect Informat, Changsha 410083, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Three-dimensional displays; Laser radar; Point cloud compression; Object detection; Semantics; Cameras; Object recognition; Convolutional neural networks; Accuracy; 3-D object detection; class-based point sampling; multimodal fusion; self-attention; semantic feature learning;

D O I：

10.1109/TGRS.2024.3476072

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Accurate 3-D object detection is vital in autonomous driving. Traditional LiDAR models struggle with sparse point clouds. We propose a novel approach integrating LiDAR and camera data to maximize sensor strengths while overcoming individual limitations for enhanced 3-D object detection. Our research introduces the channelwise and spatially guided multimodal feature fusion network (CSMNET) for 3-D object detection. First, our method enhances LiDAR data by projecting it onto a 2-D plane, enabling the extraction of class-specific features from a probability map. Second, we design class-based farthest point sampling (C-FPS), which boosts the selection of foreground points by utilizing point weights based on geometric or probability features while ensuring diversity among the selected points. Third, we developed a parallel attention (PAT)-based multimodal fusion mechanism achieving higher resolution compared to raw LiDAR points. This fusion mechanism integrates two attention mechanisms: channel attention for LiDAR data and spatial attention for camera data. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. Specifically, CSMNET achieves an average precision (AP) in bird's eye view (BEV) detection of 90.16% (easy), 85.18% (moderate), and 80.51% (hard), with a mean AP (mAP) of 85.12%. In 3-D detection, CSMNET attains 82.05% (easy), 72.64% (moderate), and 67.10% (hard) with an mAP of 73.75%. For 2-D detection, the scores are 95.47% (easy), 93.25% (moderate), and 86.68% (hard), yielding an mAP of 91.72% for the KITTI dataset.

引用

页数：15

共 71 条

[1] Pointwise Convolutional Neural Networks [J].

Binh-Son Hua ;

Minh-Khoi Tran ;

Yeung, Sai-Kit .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :984-993

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3] Background-Aware 3-D Point Cloud Segmentation With Dynamic Point Feature Aggregation [J].

Chen, Jiajing ;

Kakillioglu, Burak ;

Velipasalar, Senem .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[6]

Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/ICCV.2019.00987, 10.1109/iccv.2019.00987]

[7]

Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

[8]

Dosovitskiy A., 2021, Int. Conf. Learn. Represent

[9] Point Transformer [J].

Engel, Nico ;

Belagiannis, Vasileios ;

Dietmayer, Klaus .

IEEE ACCESS, 2021, 9 :134826-134840

[10]

Fei J., 2020, SemanticVox- els. Sequential Fusion for 3D Pedestrian Detection Using LiDAR Point Cloud and Semantic Segmentation, P185

← 1 2 3 4 5 6 7 8 →