Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

被引:4
作者
Liu, Zhanwen [1 ]
Cheng, Juanru [1 ]
Fan, Jin [1 ]
Lin, Shan [1 ]
Wang, Yang [2 ]
Zhao, Xiangmo [1 ]
机构
[1] Changan Univ, Sch Informat Engn, Xian 710064, Peoples R China
[2] Univ Sci & Technol China, Sch Informat, Hefei 230026, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Point cloud compression; Three-dimensional displays; Feature extraction; Object detection; Laser radar; Image color analysis; Detectors; Deep learning; 3D object detection; sensor fusion; Lidar sensor; camera sensor;
D O I
10.1109/TMM.2023.3270638
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Lidars and cameras are critical sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, accurate and robust fusion methods are still under exploration due to non-homogenous representations. In this paper, we find that the complementary roles of point clouds and images vary with depth. An important reason is that the point cloud appearance changes significantly with increasing distance from the Lidar, while the image's edge, color, and texture information are not sensitive to depth. To address this, we propose a fusion module based on the Depth Attention Mechanism (DAM), which mainly consists of two operations: gated feature generation and point cloud division. The former adaptively learns the importance of bimodal features without additional annotations, while the latter divides point clouds to achieve differential fusion of multi-modal features at different depths. This fusion module can enhance the representation ability of original features for different point sets and provide more comprehensive features by using the dual splicing strategy of concatenation and index connection. Additionally, considering point density as a feature and its negative correlation with depth, we build an Adaptive Threshold Generation Network (ATGN) to generate the depth threshold by extracting density information, which can divide point clouds more reasonably. Experiments on the KITTI dataset demonstrate the effectiveness and competitiveness of our proposed models.
引用
收藏
页码:707 / 717
页数:11
相关论文
共 53 条
[1]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[2]   3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhu, Yukun ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) :1259-1272
[3]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[4]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[5]  
Contributors M., 2020, MMDetection3D: OpenMMLab next-generation platform for general 3D object detection
[6]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[7]   RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection [J].
Fan, Lue ;
Xiong, Xuan ;
Wang, Feng ;
Wang, Naiyan ;
Zhang, Zhaoxiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2898-2907
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]   One Stage Monocular 3D Object Detection Utilizing Discrete Depth and Orientation Representation [J].
Haq, Muhamad Amirul ;
Ruan, Shanq-Jang ;
Shao, Mei-En ;
ul Haq, Qazi Mazhar ;
Liang, Pei-Jung ;
Gao, De-Qin .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) :21630-21640
[10]   3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans [J].
Hou, Ji ;
Dai, Angela ;
Niessner, Matthias .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4416-4425