Dynamic Scene Understanding for Autonomous Driving Using 2D-3D Convolution With Voxel Key Points

被引：1

作者：

Liu, Kunhua ^{[1
]}

Zheng, Yi ^{[1
]}

Xie, Junkun ^{[2
]}

Xie, Yuting ^{[2
]}

Wang, Feiyang ^{[1
]}

Ma, Longyan ^{[1
]}

Dai, Chenggang ^{[1
]}

Lu, Tao ^{[1
]}

机构：

[1] Qingdao Univ Technol, Sch Mech & Automot Engn, Qingdao 266520, Peoples R China

[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年

关键词：

Feature extraction; Three-dimensional displays; Point cloud compression; Semantic segmentation; Semantics; Convolution; Computational efficiency; Autonomous vehicles; Aggregates; Fuses; spatial-temporal fusion; autonomous driving;

D O I：

10.1109/TITS.2024.3510800

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

With the growing emphasis on real-time 3D data processing in autonomous driving, robotics, and intelligent vehicles, the demand for efficient point cloud processing has expanded significantly. Early deep learning approaches to point cloud semantic segmentation relied on volumetric grids and projections, which often compromised the inherent geometric structure of point clouds. More recent methods attempt to learn directly from raw point clouds, focusing on local neighborhood information; however, optimizing computational efficiency for dynamic scenes remains a challenge. This paper presents a novel 2D-3D convolutional framework, VKPNet, for point cloud semantic segmentation that leverages Voxel Key Points (VKPs) to efficiently aggregate local features and enhance receptive fields. The proposed approach first initializes 3D point cloud features using 2D image features and applies a heuristic method to filter 3D points, extracting only those necessary for semantic segmentation, thereby reducing the input data scale. VKPs are introduced to aggregate local features at voxel cube vertices, and a 3D convolution based on VKPs is designed to expand the receptive field, facilitating effective spatiotemporal feature learning. Experimental results on the ScanNet and Semantic KITTI datasets validate the effectiveness of our VKPNet model. The framework achieves mIoU scores of 0.735 on the ScanNet dataset and 0.689 on the Semantic KITTI dataset, with a processing speed of 0.09 seconds per frame on ScanNet. These results demonstrate that VKPNet not only outperforms prior methods across various benchmarks but also achieves efficient and accurate semantic segmentation in dynamic scenes.

引用

页数：11

共 46 条

[1] Rethinking Few-shot 3D Point Cloud Semantic Segmentation [J].

An, Zhaochong ;

Sun, Guolei ;

Liu, Yun ;

Liu, Fayao ;

Wu, Zongwei ;

Wang, Dan ;

Van Gool, Luc ;

Belongie, Serge .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, :3996-4006

[2] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[3] RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds [J].

Chen, Tzu-Hsuan ;

Chang, Tian Sheuan .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2022, 7 (01) :93-101

[4] 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].

Choy, Christopher ;

Gwak, JunYoung ;

Savarese, Silvio .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079

[5] Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review [J].

Cui, Yaodong ;

Chen, Ren ;

Chu, Wenbo ;

Chen, Long ;

Tian, Daxin ;

Li, Ying ;

Cao, Dongpu .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :722-739

[6] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[7] RdmkNet & Toronto-Rdmk: Large-Scale Datasets for Road Marking Classification and Segmentation [J].

Du, Jing ;

Ma, Lingfei ;

Li, Jing ;

Qin, Nannan ;

Zelek, John ;

Guan, Haiyan ;

Li, Jonathan .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) :13467-13482

[8] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks [J].

Graham, Benjamin ;

Engelcke, Martin ;

van der Maaten, Laurens .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9224-9232

[9]

He Y., 2022, P EUR C COMP VIS, P726

[10] 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans [J].

Hou, Ji ;

Dai, Angela ;

Niessner, Matthias .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4416-4425

← 1 2 3 4 5 →