BCAF-3D: Bilateral Content Awareness Fusion for cross-modal 3D object detection

被引:5
作者
Chen, Mu [1 ,2 ,3 ,4 ]
Liu, Pengfei [1 ,2 ,3 ]
Zhao, Huaici [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Key Lab Optoelect Informat Proc, Shenyang 110016, Liaoning, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110016, Liaoning, Peoples R China
[3] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110016, Liaoning, Peoples R China
[4] Univ Chinese Acad Sci, Beijing 100094, Peoples R China
关键词
Cross-modal fusion; 3D object detection; Autonomous driving; ATTENTION;
D O I
10.1016/j.knosys.2023.110952
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As two major data modalities in autonomous driving, LiDAR point clouds and RGB images include rich geometric clues and semantic features. Compared with using a single data modality, fusing two data modalities can provide complementary information for the 3D object detection task. However, some prevalent cross-modal methods (Vora et al., 2020; Huang et al., 2020; Sindagi et al., 2019) cannot effectively obtain favorable information, and only adopt a unilateral fusion mechanism. In this paper, we propose a novel fusion strategy named Bilateral Content Awareness Fusion (BCAF) to address these issues. Specifically, BCAF adopts a two-stream structure consisting of a LiDAR Content Awareness (LCA) branch and an Image Content Awareness (ICA) branch along with a Soft Fusion (SF) module. First, the LCA and ICA are used to enhance instance-relevant clues. Then, with two awareness features given by the LCA and ICA branches, the aggregation features can be generated to choose favorable image features and LiDAR features. Finally, the SF module fuses the bilateral favorable features and outputs the cross-modal feature. Experiments of our method are conducted on the KITTI dataset, including 3D object detection evaluation and bird's eye view evaluation. Compared with the previous art method, our approach achieves significant improvements. Especially for the metric of mean Average Precision (mAP) on the Car category, our approach obtains 0.5 and 0.62 gains for the tasks of 3D object detection and bird's eye view, respectively.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 61 条
[1]  
Sindagi VA, 2019, Arxiv, DOI arXiv:1904.01649
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]   A Hierarchical Graph Network for 3D Object Detection on Point Clouds [J].
Chen, Jintai ;
Lei, Biwen ;
Song, Qingyu ;
Ying, Haochao ;
Chen, Danny Z. ;
Wu, Jian .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :389-398
[4]   M3DGAF: Monocular 3D Object Detection With Geometric Appearance Awareness and Feature Fusion [J].
Chen, Mu ;
Liu, Pengfei ;
Zhao, Huaici .
IEEE SENSORS JOURNAL, 2023, 23 (11) :11232-11240
[5]   LiDAR-camera fusion: Dual transformer enhancement for 3D object detection [J].
Chen, Mu ;
Liu, Pengfei ;
Zhao, Huaici .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
[6]  
Chen XZ, 2017, Arxiv, DOI [arXiv:1611.07759, 10.48550/ARXIV.1611.07759]
[7]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[8]  
Chen XZ, 2015, ADV NEUR IN, V28
[9]   MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships [J].
Chen, Yongjian ;
Tai, Lei ;
Sun, Kai ;
Li, Mingyang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12090-12099
[10]  
Engelcke Martin, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1355, DOI 10.1109/ICRA.2017.7989161