Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving

被引:3
|
作者
Alaba, Simegnew Yihunie [1 ]
Ball, John E. [1 ]
机构
[1] Mississippi State Univ, James Worth Bagley Coll Engn, Dept Elect & Comp Engn, Starkville, MS 39762 USA
关键词
Laser radar; Three-dimensional displays; Transformers; Point cloud compression; Object detection; Feature extraction; Cameras; Autonomous driving; LiDAR; multimodal fusion; network compression; pruning; quantization; quantization-aware training; sparsity; vision transformer; 3D object detection;
D O I
10.1109/ACCESS.2024.3385439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurate 3D object detection is vital for autonomous driving since it facilitates accurate perception of the environment through multiple sensors. Although cameras can capture detailed color and texture features, they have limitations regarding depth information. Additionally, they can struggle under adverse weather or lighting conditions. In contrast, LiDAR sensors offer robust depth information but lack the visual detail for precise object classification. This work presents a multimodal fusion model that improves 3D object detection by combining the benefits of LiDAR and camera sensors to address these challenges. This model processes camera images and LiDAR point cloud data into a voxel-based representation, further refined by encoder networks to enhance spatial interaction and reduce semantic ambiguity. The proposed multiresolution attention module and integration of discrete wavelet transform and inverse discrete wavelet transform to the image backbone improve the feature extraction capability. This approach enhances the fusion of LiDAR depth information with the camera's textural and color detail. The model also incorporates a transformer decoder network with self-attention and cross-attention mechanisms, fostering robust and accurate detection through global interaction between identified objects and encoder features. Furthermore, the proposed network is refined with advanced optimization techniques, including pruning and Quantization-Aware Training (QAT), to maintain a competitive performance while significantly decreasing the need for memory and computational resources. Performance evaluations on the nuScenes dataset show that the optimized model architecture offers competitive results and significantly improves operational efficiency and effectiveness in multimodal fusion 3D object detection.
引用
收藏
页码:50165 / 50176
页数:12
相关论文
共 50 条
  • [1] A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends
    Zhu, Minling
    Gong, Yadong
    Tian, Chunwei
    Zhu, Zuyuan
    DRONES, 2024, 8 (08)
  • [2] Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer
    She, Xiangyang
    Yan, Weijia
    Dong, Lihong
    Computer Engineering and Applications, 2024, 60 (19) : 178 - 189
  • [3] 3D object detection based on image and LIDAR fusion for autonomous driving
    Chen G.
    Yi H.
    Mao Z.
    International Journal of Vehicle Information and Communication Systems, 2023, 8 (03) : 237 - 251
  • [4] TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection
    Pang, Su
    Morris, Daniel
    Radha, Hayder
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 10902 - 10909
  • [5] Adaptive Feature Fusion Based Cooperative 3D Object Detection for Autonomous Driving
    Wang, Junyong
    Zeng, Yuan
    Gong, Yi
    2022 3RD INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE (ICTC 2022), 2022, : 103 - 107
  • [6] Transformer-Based Sensor Fusion for Autonomous Driving: A Survey
    Singh, Apoorv
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3304 - 3309
  • [7] A review of 3D object detection based on autonomous driving
    Wang, Huijuan
    Chen, Xinyue
    Yuan, Quanbo
    Liu, Peng
    VISUAL COMPUTER, 2025, 41 (03): : 1757 - 1775
  • [8] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    ELECTRONICS, 2023, 12 (14)
  • [9] Multimodal Cooperative 3D Object Detection Over Connected Vehicles for Autonomous Driving
    Chi, Fangyuan
    Wang, Yixiao
    Pourazad, Mahsa T.
    Nasiopoulos, Panos
    Leung, Victor C. M.
    IEEE NETWORK, 2023, 37 (04): : 265 - 272
  • [10] Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
    Ye, Mao
    Meyer, Gregory P.
    Chai, Yuning
    Liu, Qiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8404 - 8416