Three- Dimensional Object Detection Technology Based on Point Cloud Data

被引：21

作者：

Li Jianan ^{[1
,2
]}

Wang Ze ^{[1
]}

Xu Tingfa ^{[1
,2
,3
]}

机构：

[1] Beijing Inst Technol, Sch Optoelect, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Key Lab Photoelect Imaging Technol & Syst, Minist Educ, Beijing 100081, Peoples R China

[3] Beijing Inst Technol, Chongqing Innovat Ctr, Chongqing 401135, Peoples R China

来源：

ACTA OPTICA SINICA | 2023年 / 43卷 / 15期

关键词：

point cloud; 3D object detection; single modality; multi; modality; CONTEXTUAL CLASSIFICATION;

D O I：

10.3788/AOS230745

中图分类号：

O43 [光学];

学科分类号：

070207 ; 0803 ;

摘要：

Significance In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners. Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to twodimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise. In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high- quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field. Progress The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1. Point cloud data are characterized by sparsity, non- uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel- based, point- based, graph-based, and point+ voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four- dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R- CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+ voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4. Conclusions and Prospects The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self- supervised learning, and the creation and utilization of complex datasets.

引用

页数：17

共 82 条

[1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[2] Graph-Based Object Classification for Neuromorphic Vision Sensing [J].

Bi, Yin ;

Chadha, Aaron ;

Abbas, Alhabib ;

Bourtsoulatze, Eirina ;

Andreopoulos, Yiannis .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :491-501

[3] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[4]

Chen C, 2022, AAAI CONF ARTIF INTE, P221

[5] PQ-Transformer: Jointly Parsing 3D Objects and Layouts From Point Clouds [J].

Chen, Xiaoxue ;

Zhao, Hao ;

Zhou, Guyue ;

Zhang, Ya-Qin .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) :2519-2526

[6] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[7]

Chen XY, 2023, Arxiv, DOI arXiv:2203.10642

[8]

Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]

[9] STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes [J].

Cong, Peishan ;

Zhu, Xinge ;

Qiao, Feng ;

Ren, Yiming ;

Peng, Xidong ;

Hou, Yuenan ;

Xu, Lan ;

Yang, Ruigang ;

Manocha, Dinesh ;

Ma, Yuexin .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19576-19585

[10] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

← 1 2 3 4 5 6 7 8 9 →