PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation

被引：0

作者：

Yi, Hong ^{[1
]}

Liu, Yaru ^{[2
]}

Wang, Ming ^{[3
,4
]}

机构：

[1] Harbin Normal Univ, Coll Geog Sci, Harbin 150025, Peoples R China

[2] Guangdong Urban Rural Planning & Design Res Inst Technol Grp Co Ltd, Guangzhou 510290, Peoples R China

[3] Inspur Cloud Informat Technol Co Ltd, Jinan 250101, Peoples R China

[4] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China

来源：

REMOTE SENSING | 2025年 / 17卷 / 12期

关键词：

patch-based self-attention; 3D point cloud; semantic segmentation; receptive field;

D O I：

10.3390/rs17122012

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

LiDAR-captured 3D point clouds are widely used in self-driving cars and smart cities. Point-based semantic segmentation methods allow for more efficient use of the rich geometric information contained in 3D point clouds, so it has gradually replaced other methods as the mainstream deep learning method in 3D point cloud semantic segmentation. However, existing methods suffer from limited receptive fields and feature misalignment due to hierarchical downsampling. To address these challenges, we propose PSNet, a novel patch-based self-attention network that significantly expands the receptive field while ensuring feature alignment through a patch-aggregation paradigm. PSNet combines patch-based self-attention feature extraction with common point feature aggregation (CPFA) to implicitly model large-scale spatial relationships. The framework first divides the point cloud into overlapping patches to extract local features via multi-head self-attention, then aggregates features of common points across patches to capture long-range context. Extensive experiments on Toronto-3D and Complex Scene Point Cloud (CSPC) datasets validate PSNet's state-of-the-art performance, achieving overall accuracies (OAs) of 98.4% and 97.2%, respectively, with significant improvements in challenging categories (e.g., +32.1% IoU for fences). Experimental results on the S3DIS dataset show that PSNet attains competitive mIoU accuracy (71.2%) while maintaining lower inference latency (7.03 s). The PSNet architecture achieves a larger receptive field coverage, which represents a significant advantage over existing methods. This work not only reveals the mechanism of patch-based self-attention for receptive field enhancement but also provides insights into attention-based 3D geometric learning and semantic segmentation architectures. Furthermore, it provides substantial references for applications in autonomous vehicle navigation and smart city infrastructure management.

引用

页数：20

共 46 条

[31] SPLATNet: Sparse Lattice Networks for Point Cloud Processing [J].

Su, Hang ;

Jampani, Varun ;

Sun, Deqing ;

Maji, Subhransu ;

Kalogerakis, Evangelos ;

Yang, Ming-Hsuan ;

Kautz, Jan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2530-2539

[32] Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways [J].

Tan, Weikai ;

Qin, Nannan ;

Ma, Lingfei ;

Li, Ying ;

Du, Jing ;

Cai, Guorong ;

Yang, Ke ;

Li, Jonathan .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :797-806

[33] Tangent Convolutions for Dense Prediction in 3D [J].

Tatarchenko, Maxim ;

Park, Jaesik ;

Koltun, Vladlen ;

Zhou, Qian-Yi .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3887-3896

[34] KPConv: Flexible and Deformable Convolution for Point Clouds [J].

Thomas, Hugues ;

Qi, Charles R. ;

Deschaud, Jean-Emmanuel ;

Marcotegui, Beatriz ;

Goulette, Francois ;

Guibas, Leonidas J. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6420-6429

[35] CSPC-Dataset: New LiDAR Point Cloud Dataset and Benchmark for Large-Scale Scene Semantic Segmentation [J].

Tong, Guofeng ;

Li, Yong ;

Chen, Dong ;

Sun, Qi ;

Cao, Wei ;

Xiang, Guiqiu .

IEEE ACCESS, 2020, 8 :87695-87718

[36]

Vaswani A, 2017, ADV NEUR IN, V30

[37] Inner Attention based Recurrent Neural Networks for Answer Selection [J].

Wang, Bingning ;

Liu, Kang ;

Zhao, Jun .

PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, :1288-1297

[38] Dynamic Graph CNN for Learning on Point Clouds [J].

Wang, Yue ;

Sun, Yongbin ;

Liu, Ziwei ;

Sarma, Sanjay E. ;

Bronstein, Michael M. ;

Solomon, Justin M. .

ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (05)

[39] A novel splice variant of goat CPT1a gene and their diverse mRNA expression profiles [J].

Wu, Xian-feng ;

Liu, Yuan ;

Zhan, Jin-shun ;

Huang, Qin-lou ;

Li, Wen-yang .

ANIMAL BIOTECHNOLOGY, 2023, 34 (07) :2571-2581

[40] Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation [J].

Xie, Yuxing ;

Tian, Jiaojiao ;

Zhu, Xiao Xiang .

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2020, 8 (04) :38-59

← 1 2 3 4 5 →