PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation

被引:0
作者
Yi, Hong [1 ]
Liu, Yaru [2 ]
Wang, Ming [3 ,4 ]
机构
[1] Harbin Normal Univ, Coll Geog Sci, Harbin 150025, Peoples R China
[2] Guangdong Urban Rural Planning & Design Res Inst Technol Grp Co Ltd, Guangzhou 510290, Peoples R China
[3] Inspur Cloud Informat Technol Co Ltd, Jinan 250101, Peoples R China
[4] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
关键词
patch-based self-attention; 3D point cloud; semantic segmentation; receptive field;
D O I
10.3390/rs17122012
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
LiDAR-captured 3D point clouds are widely used in self-driving cars and smart cities. Point-based semantic segmentation methods allow for more efficient use of the rich geometric information contained in 3D point clouds, so it has gradually replaced other methods as the mainstream deep learning method in 3D point cloud semantic segmentation. However, existing methods suffer from limited receptive fields and feature misalignment due to hierarchical downsampling. To address these challenges, we propose PSNet, a novel patch-based self-attention network that significantly expands the receptive field while ensuring feature alignment through a patch-aggregation paradigm. PSNet combines patch-based self-attention feature extraction with common point feature aggregation (CPFA) to implicitly model large-scale spatial relationships. The framework first divides the point cloud into overlapping patches to extract local features via multi-head self-attention, then aggregates features of common points across patches to capture long-range context. Extensive experiments on Toronto-3D and Complex Scene Point Cloud (CSPC) datasets validate PSNet's state-of-the-art performance, achieving overall accuracies (OAs) of 98.4% and 97.2%, respectively, with significant improvements in challenging categories (e.g., +32.1% IoU for fences). Experimental results on the S3DIS dataset show that PSNet attains competitive mIoU accuracy (71.2%) while maintaining lower inference latency (7.03 s). The PSNet architecture achieves a larger receptive field coverage, which represents a significant advantage over existing methods. This work not only reveals the mechanism of patch-based self-attention for receptive field enhancement but also provides insights into attention-based 3D geometric learning and semantic segmentation architectures. Furthermore, it provides substantial references for applications in autonomous vehicle navigation and smart city infrastructure management.
引用
收藏
页数:20
相关论文
共 46 条
[1]   SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks [J].
Boulch, Alexandre ;
Guerry, Yids ;
Le Saux, Bertrand ;
Audebert, Nicolas .
COMPUTERS & GRAPHICS-UK, 2018, 71 :189-198
[2]  
Daniluk M, 2017, Arxiv, DOI [arXiv:1702.04521, DOI 10.48550/ARXIV.1702.04521]
[3]  
Engelmann F, 2020, IEEE INT CONF ROBOT, P9463, DOI [10.1109/icra40945.2020.9197503, 10.1109/ICRA40945.2020.9197503]
[4]   3D Semantic Segmentation with Submanifold Sparse Convolutional Networks [J].
Graham, Benjamin ;
Engelcke, Martin ;
van der Maaten, Laurens .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9224-9232
[5]  
Gulcehre C., 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.00036
[6]   Deep Learning for 3D Point Clouds: A Survey [J].
Guo, Yulan ;
Wang, Hanyun ;
Hu, Qingyong ;
Liu, Hao ;
Liu, Li ;
Bennamoun, Mohammed .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) :4338-4364
[7]  
Hackel T, 2017, Arxiv, DOI [arXiv:1704.03847, DOI 10.5194/ISPRS-ANNALS-IV-1-W1-91-2017]
[8]   A point-based deep learning network for semantic segmentation of MLS point clouds [J].
Han, Xu ;
Dong, Zhen ;
Yang, Bisheng .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 175 :199-214
[9]   SliceSamp: A Promising Downsampling Alternative for Retaining Information in a Neural Network [J].
He, Lianlian ;
Wang, Ming .
APPLIED SCIENCES-BASEL, 2023, 13 (21)
[10]  
Hu QY, 2020, PROC CVPR IEEE, P11105, DOI 10.1109/CVPR42600.2020.01112