Multi-Modal 3D Object Detection by Box Matching

被引：2

作者：

Liu, Zhe ^{[1
]}

Ye, Xiaoqing ^{[2
]}

Zou, Zhikang ^{[2
]}

He, Xinwei ^{[3
]}

Tan, Xiao ^{[2
]}

Ding, Errui ^{[2
]}

Wang, Jingdong ^{[2
]}

Bai, Xiang ^{[4
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China

[2] Baidu Inc, Beijing 100085, Peoples R China

[3] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China

[4] Huazhong Univ Sci & Technol, Sch Software, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年

关键词：

Three-dimensional displays; Laser radar; Feature extraction; Cameras; Sensors; Proposals; Object detection; Multi-modal; 3D object detection; feature alignment; box matching;

D O I：

10.1109/TITS.2024.3453963

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combining their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more robust in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios.

引用

页码：19917 / 19928

页数：12

共 50 条

[1] GraphAlign plus plus : An Accurate Feature Alignment by Graph Matching for Multi-Modal 3D Object Detection
Song, Ziying
Jia, Caiyan
Yang, Lei
Wei, Haiyue
Liu, Lin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2619 - 2632
[2] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
Li, Jiahao
Chen, Lingshan
Li, Zhen
IEEE ACCESS, 2025, 13 : 52385 - 52396
[3] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
Liu, Zhanwen
Cheng, Juanru
Fan, Jin
Lin, Shan
Wang, Yang
Zhao, Xiangmo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
[4] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
Liu, Zhe
Huang, Tengteng
Li, Bingling
Chen, Xiwu
Wang, Xi
Bai, Xiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341
[5] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
Wang, Xiyang
Fu, Chunyun
He, Jiawei
Huang, Mingguang
Meng, Ting
Zhang, Siyu
Zhou, Hangning
Xu, Ziyao
Zhang, Chi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539
[6] Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection
Zhou, Taohua
Chen, Junjie
Shi, Yining
Jiang, Kun
Yang, Mengmeng
Yang, Diange
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1523 - 1535
[7] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
Li, Xin
Shi, Botian
Hou, Yuenan
Wu, Xingjiao
Ma, Tianlong
Li, Yikang
He, Liang
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
[8] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
Guo, Kun
Gan, Tong
Ding, Zhao
Ling, Qiang
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
[9] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
Schierl, Jonathan
Graehling, Quinn
Aspiras, Theus
Asari, Vijay
Van Rynbach, Andre
Rabb, Dave
2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
[10] Exploiting Multi-Modal Synergies for Enhancing 3D Multi-Object Tracking
Xu, Xinglong
Ren, Weihong
Chen, Xi'ai
Fan, Huijie
Han, Zhi
Liu, Honghai
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8643 - 8650

← 1 2 3 4 5 →