PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection

被引：6

作者：

Xie, Guotao ^{[1
,2
]}

Chen, Zhiyuan ^{[1
]}

Gao, Ming ^{[1
,2
]}

Hu, Manjiang ^{[1
,2
]}

Qin, Xiaohui ^{[1
,2
]}

机构：

[1] Hunan Univ, Coll Mech & Vehicle Engn, State Key Lab Adv Design & Mfg Technol Vehicle, Changsha 410082, Peoples R China

[2] Hunan Univ, Wuxi Intelligent Control Res Inst, Wuxi 214115, Jiangsu, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 06期

关键词：

Autonomous driving; 3D object detection; camera-LiDAR fusion; intelligent transportation systems;

D O I：

10.1109/TITS.2023.3347078

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Multi-modal fusion can take advantage of the LiDAR and camera to boost the robustness and performance of 3D object detection. However, there are still of great challenges to comprehensively exploit image information and perform accurate diverse feature interaction fusion. In this paper, we proposed a novel multi-modal framework, namely Point-Pixel Fusion for Multi-Modal 3D Object Detection (PPF-Det). The PPF-Det consists of three submodules, Multi Pixel Perception (MPP), Shared Combined Point Feature Encoder (SCPFE), and Point-Voxel-Wise Triple Attention Fusion (PVW-TAF) to address the above problems. Firstly, MPP can make full use of image semantic information to mitigate the problem of resolution mismatch between point cloud and image. In addition, we proposed SCPFE to preliminary extract point cloud features and point-pixel features simultaneously reducing time-consuming on 3D space. Lastly, we proposed a fine alignment fusion strategy PVW-TAF to generate multi-level voxel-fused features based on attention mechanism. Extensive experiments on KITTI benchmarks, conducted on September 24, 2023, demonstrate that our method shows excellent performance.

引用

页码：5598 / 5611

页数：14

共 69 条

[31]

Lu HH, 2019, INT CONF ACOUST SPEE, P1992, DOI 10.1109/ICASSP.2019.8682746

[32] 3D-SSD: Learning hierarchical features from RGB-D images for amodal 3D object detection [J].

Luo, Qianhui ;

Ma, Huifang ;

Tang, Li ;

Wang, Yue ;

Xiong, Rong .

NEUROCOMPUTING, 2020, 378 :364-374

[33] An Alternative Probabilistic Interpretation of the Huber Loss [J].

Meyer, Gregory P. .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5257-5265

[34] Learning Depth-Guided Convolutions for Monocular 3D Object Detection [J].

Ng, Mingyu ;

Huo, Yuqi ;

Yi, Hongwei ;

Wang, Zhe ;

Shi, Jianping ;

Lu, Zhiwu ;

Luo, Ping .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :4306-4315

[35] CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection [J].

Pang, Su ;

Morris, Daniel ;

Radha, Hayder .

2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :10386-10393

[36]

Qi CR, 2017, ADV NEUR IN, V30

[37] Frustum PointNets for 3D Object Detection from RGB-D Data [J].

Qi, Charles R. ;

Liu, Wei ;

Wu, Chenxia ;

Su, Hao ;

Guibas, Leonidas J. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :918-927

[38] Categorical Depth Distribution Network for Monocular 3D Object Detection [J].

Reading, Cody ;

Harakeh, Ali ;

Chae, Julia ;

Waslander, Steven L. .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8551-8560

[39] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].

Ren, Shaoqing ;

He, Kaiming ;

Girshick, Ross ;

Sun, Jian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149

[40]

Shi S., 2019, arXiv

← 1 2 3 4 5 6 7 →