Multi-Modal 3D Object Detection by Box Matching

被引:2
|
作者
Liu, Zhe [1 ]
Ye, Xiaoqing [2 ]
Zou, Zhikang [2 ]
He, Xinwei [3 ]
Tan, Xiao [2 ]
Ding, Errui [2 ]
Wang, Jingdong [2 ]
Bai, Xiang [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
[3] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Software, Wuhan 430074, Peoples R China
关键词
Three-dimensional displays; Laser radar; Feature extraction; Cameras; Sensors; Proposals; Object detection; Multi-modal; 3D object detection; feature alignment; box matching;
D O I
10.1109/TITS.2024.3453963
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combining their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more robust in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios.
引用
收藏
页码:19917 / 19928
页数:12
相关论文
共 50 条
  • [1] GraphAlign plus plus : An Accurate Feature Alignment by Graph Matching for Multi-Modal 3D Object Detection
    Song, Ziying
    Jia, Caiyan
    Yang, Lei
    Wei, Haiyue
    Liu, Lin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2619 - 2632
  • [2] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [3] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [4] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
    Liu, Zhe
    Huang, Tengteng
    Li, Bingling
    Chen, Xiwu
    Wang, Xi
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341
  • [5] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
    Wang, Xiyang
    Fu, Chunyun
    He, Jiawei
    Huang, Mingguang
    Meng, Ting
    Zhang, Siyu
    Zhou, Hangning
    Xu, Ziyao
    Zhang, Chi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539
  • [6] Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection
    Zhou, Taohua
    Chen, Junjie
    Shi, Yining
    Jiang, Kun
    Yang, Mengmeng
    Yang, Diange
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1523 - 1535
  • [7] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [8] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [9] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
    Schierl, Jonathan
    Graehling, Quinn
    Aspiras, Theus
    Asari, Vijay
    Van Rynbach, Andre
    Rabb, Dave
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [10] Exploiting Multi-Modal Synergies for Enhancing 3D Multi-Object Tracking
    Xu, Xinglong
    Ren, Weihong
    Chen, Xi'ai
    Fan, Huijie
    Han, Zhi
    Liu, Honghai
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8643 - 8650