PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection

被引：6

作者：

Xie, Guotao ^{[1
,2
]}

Chen, Zhiyuan ^{[1
]}

Gao, Ming ^{[1
,2
]}

Hu, Manjiang ^{[1
,2
]}

Qin, Xiaohui ^{[1
,2
]}

机构：

[1] Hunan Univ, Coll Mech & Vehicle Engn, State Key Lab Adv Design & Mfg Technol Vehicle, Changsha 410082, Peoples R China

[2] Hunan Univ, Wuxi Intelligent Control Res Inst, Wuxi 214115, Jiangsu, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 06期

关键词：

Autonomous driving; 3D object detection; camera-LiDAR fusion; intelligent transportation systems;

D O I：

10.1109/TITS.2023.3347078

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Multi-modal fusion can take advantage of the LiDAR and camera to boost the robustness and performance of 3D object detection. However, there are still of great challenges to comprehensively exploit image information and perform accurate diverse feature interaction fusion. In this paper, we proposed a novel multi-modal framework, namely Point-Pixel Fusion for Multi-Modal 3D Object Detection (PPF-Det). The PPF-Det consists of three submodules, Multi Pixel Perception (MPP), Shared Combined Point Feature Encoder (SCPFE), and Point-Voxel-Wise Triple Attention Fusion (PVW-TAF) to address the above problems. Firstly, MPP can make full use of image semantic information to mitigate the problem of resolution mismatch between point cloud and image. In addition, we proposed SCPFE to preliminary extract point cloud features and point-pixel features simultaneously reducing time-consuming on 3D space. Lastly, we proposed a fine alignment fusion strategy PVW-TAF to generate multi-level voxel-fused features based on attention mechanism. Extensive experiments on KITTI benchmarks, conducted on September 24, 2023, demonstrate that our method shows excellent performance.

引用

页码：5598 / 5611

页数：14

共 69 条

[1] SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [J].

Bhattacharyya, Prarthana ;

Huang, Chengjie ;

Czarnecki, Krzysztof .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3022-3031

[2] Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [J].

Chabot, Florian ;

Chaouch, Mohamed ;

Rabarisoa, Jaonary ;

Teuliere, Celine ;

Chateau, Thierry .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1827-1836

[3] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation [J].

Chen, Hansheng ;

Huang, Yuyao ;

Tian, Wei ;

Gao, Zhong ;

Xiong, Lu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10374-10383

[4] Disparity-Based Multiscale Fusion Network for Transportation Detection [J].

Chen, Jing ;

Wang, Qichao ;

Peng, Weiming ;

Xu, Haitao ;

Li, Xiaodong ;

Xu, Wenqiang .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) :18855-18863

[5] RGB Image- and Lidar-Based 3D Object Detection Under Multiple Lighting Scenarios [J].

Chen, Wentao ;

Tian, Wei ;

Xie, Xiang ;

Stork, Wilhelm .

AUTOMOTIVE INNOVATION, 2022, 5 (03) :251-259

[6]

Chen X., 2016, ARXIV

[7] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[8]

Chen XZ, 2015, ADV NEUR IN, V28

[9] Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review [J].

Cui, Yaodong ;

Chen, Ren ;

Chu, Wenbo ;

Chen, Long ;

Tian, Daxin ;

Li, Ying ;

Cao, Dongpu .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :722-739

[10] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

← 1 2 3 4 5 6 7 →