Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving

被引:3
|
作者
Alaba, Simegnew Yihunie [1 ]
Ball, John E. [1 ]
机构
[1] Mississippi State Univ, James Worth Bagley Coll Engn, Dept Elect & Comp Engn, Starkville, MS 39762 USA
关键词
Laser radar; Three-dimensional displays; Transformers; Point cloud compression; Object detection; Feature extraction; Cameras; Autonomous driving; LiDAR; multimodal fusion; network compression; pruning; quantization; quantization-aware training; sparsity; vision transformer; 3D object detection;
D O I
10.1109/ACCESS.2024.3385439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurate 3D object detection is vital for autonomous driving since it facilitates accurate perception of the environment through multiple sensors. Although cameras can capture detailed color and texture features, they have limitations regarding depth information. Additionally, they can struggle under adverse weather or lighting conditions. In contrast, LiDAR sensors offer robust depth information but lack the visual detail for precise object classification. This work presents a multimodal fusion model that improves 3D object detection by combining the benefits of LiDAR and camera sensors to address these challenges. This model processes camera images and LiDAR point cloud data into a voxel-based representation, further refined by encoder networks to enhance spatial interaction and reduce semantic ambiguity. The proposed multiresolution attention module and integration of discrete wavelet transform and inverse discrete wavelet transform to the image backbone improve the feature extraction capability. This approach enhances the fusion of LiDAR depth information with the camera's textural and color detail. The model also incorporates a transformer decoder network with self-attention and cross-attention mechanisms, fostering robust and accurate detection through global interaction between identified objects and encoder features. Furthermore, the proposed network is refined with advanced optimization techniques, including pruning and Quantization-Aware Training (QAT), to maintain a competitive performance while significantly decreasing the need for memory and computational resources. Performance evaluations on the nuScenes dataset show that the optimized model architecture offers competitive results and significantly improves operational efficiency and effectiveness in multimodal fusion 3D object detection.
引用
收藏
页码:50165 / 50176
页数:12
相关论文
共 50 条
  • [1] TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving
    Chitta, Kashyap
    Prakash, Aditya
    Jaeger, Bernhard
    Yu, Zehao
    Renz, Katrin
    Geiger, Andreas
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12878 - 12895
  • [2] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
    Liu, Hao
    Ma, Yanni
    Wang, Hanyun
    Zhang, Chaobo
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000
  • [3] RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving
    Zhang, Xinyu
    Wang, Li
    Zhang, Guoxin
    Lan, Tianwei
    Zhang, Haoming
    Zhao, Lijun
    Li, Jun
    Zhu, Lei
    Liu, Huaping
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [4] Performance and Challenges of 3D Object Detection Methods in Complex Scenes for Autonomous Driving
    Wang, Ke
    Zhou, Tianqiang
    Li, Xingcan
    Ren, Fan
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1699 - 1716
  • [5] 3D-DFM: Anchor-Free Multimodal 3-D Object Detection With Dynamic Fusion Module for Autonomous Driving
    Lin, Chunmian
    Tian, Daxin
    Duan, Xuting
    Zhou, Jianshan
    Zhao, Dezong
    Cao, Dongpu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10812 - 10822
  • [6] Multimodal 3D Object Detection Based on Sparse Interaction in Internet of Vehicles
    Li, Hui
    Ge, Tongao
    Bai, Keqiang
    Nie, Gaofeng
    Xu, Lingwei
    Ai, Xiaoxue
    Cao, Song
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2174 - 2186
  • [7] Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles
    Uzair, Muhammad
    Dong, Jian
    Shi, Ronghua
    Mushtaq, Husnain
    Ullah, Irshad
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [8] A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends
    Zhu, Minling
    Gong, Yadong
    Tian, Chunwei
    Zhu, Zuyuan
    DRONES, 2024, 8 (08)
  • [9] Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving
    Yuan, Zhenxun
    Song, Xiao
    Bai, Lei
    Wang, Zhe
    Ouyang, Wanli
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2068 - 2078
  • [10] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396