2D Instance-Guided Pseudo-LiDAR Point Cloud for Monocular 3D Object Detection

被引：0

作者：

Gao, Rui ^{[1
]}

Kim, Junoh ^{[2
]}

Cho, Kyungeun ^{[3
]}

机构：

[1] Dongguk University, Department of Multimedia Engineering, Seoul

[2] Dongguk University, NUI/NUX Platform Research Center, Seoul

[3] Dongguk University, Division of AI Software Convergence, Seoul

来源：

IEEE Access | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Autonomous driving; deep learning; monocular 3D object detection;

D O I：

10.1109/ACCESS.2024.3514673

中图分类号：

学科分类号：

摘要：

Monocular three-dimensional (3D) scene understanding tasks, e.g., object size angle and 3D position, estimation are challenging to perform. More successful current methods usually require data from 3D sensors, which limits their performance if only monocular images are used because of the lack of distance information. In recent years, 3D object detection methods based on pseudo-LiDAR have effectively improved the accuracy of 3D prediction. However, most pseudo-LiDAR methods directly use LiDAR-based 3D detection networks, which also limits their performance. In this paper, a new monocular 3d object detection framework is proposed. By redesigning the representation of pseudo-LiDAR, a 2d detection mask channel is introduced as a guide layer to guide 3D object detection. A new 3D object detection network is designed for the newly designed multi-channel data. The pillar feature encoding module is optimized by using 2d detection mask and height histogram, so that it can more effectively convert point clouds into pillar features by using 2d detection mask and height characteristics of different objects. A new feature extraction network was designed using the transformer module to extract pillar features. The transformer head was optimized using three independent heads to categorize the position, properties and specific parameters of the object. An evaluation on the challenging KITTI dataset demonstrates that the proposed method significantly improves the performance of state-of-the-art monocular methods. © 2013 IEEE.

引用

页码：187813 / 187827

页数：14

共 53 条

[1] Ma W Ouyang X., Simonelli A., Ricci E., 3D object detection from images for autonomous driving: A survey, IEEE Trans. Pattern Anal. Mach. Intell, 46, 5, pp. 3537-3556, (2024)
[2] Gao R., Park J., Hu X., Yang S., Cho K., Reflective noise filtering of large-scale point cloud using multi-position LiDAR sensing data, Remote Sens, 13, 16, (2021)
[3] Gao R., Li M., Yang S.-J., Cho K., Reflective noise filtering of largescale point cloud using transformer, Remote Sens, 14, 3, (2022)
[4] Chabot F., Chaouch M., Rabarisoa J., Teuliere C., Chateau T., Deep MANTA: A coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1827-1836, (2017)
[5] Chen X., Kundu K., Zhang Z., Ma H., Fidler S., Urtasun R., Monocular 3D object detection for autonomous driving, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 2147-2156, (2016)
[6] Chen X., Kundu K., Zhu Y., Berneshawi A.G., Ma H., Fidler S., Urtasun R., 3D object proposals for accurate object class detection, Proc. Adv. Neural Inf. Process. Syst, 28, pp. 424-432, (2015)
[7] Ouyang W., Wang X., Joint deep learning for pedestrian detection, Proc. IEEE Int. Conf. Comput. Vis, pp. 2056-2063, (2013)
[8] Ma X., Wang Z., Li H., Zhang P., Ouyang W., Fan X., Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving, Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 6850-6859, (2019)
[9] Wang Y., Chao W.-L., Garg D., Hariharan B., Campbell M., Weinberger K.Q., Seudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8437-8445, (2019)
[10] Chen W., Li Y., Tian Z., Zhang F., 2D and 3D object detection algorithms from images: A survey, Array, 19, (2023)

← 1 2 3 4 5 6 →