Point Mask Transformer for Outdoor Point Cloud Semantic Segmentation

被引:0
作者
Li, Xiangqian [1 ]
Tan, Xin [1 ]
Zhang, Zhizhong [1 ]
Xie, Yuan [1 ]
Ma, Lizhuang [1 ]
机构
[1] Sch Comp Sci & Technol, East China Normal Univ, Shanghai 200062, Peoples R China
来源
COMPUTATIONAL VISUAL MEDIA | 2025年 / 11卷 / 03期
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Point cloud compression; Transformers; Semantic segmentation; Three-dimensional displays; Feature extraction; Decoding; Computer architecture; Convolution; Vectors; Transforms; point cloud; deep learning; semantic segmentation; transformer;
D O I
10.26599/CVM.2025.9450388
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Current outdoor point-cloud segmentation methods typically formulate semantic segmentation as a per-point/voxel-classification task. Although this strategy is straightforward because it classifies each point directly, it ignores the overall relationship of the category. As an alternative paradigm, mask classification decouples category classification from region localization, allowing the model to better capture overall category relationships. In this paper, we propose a novel approach called the point mask transformer (PMFormer), which transforms the semantic segmentation of point clouds from per-point classification to mask classification using a transformer architecture. The proposed model comprises a 3D backbone, transformer decoder, and segmentation head that predicts a series of binary masks, each associated with a global class label. Furthermore, to accommodate the unique characteristics of large and sparse outdoor point-cloud scenes, we propose three enhancements for the integration of point-cloud data with the transformer: MaskMix, 3D position encoding, and attention weights. We evaluate our model using the SemanticKITTI and nuScenes datasets. Our experimental results show that the proposed method outperforms state-of-the-art semantic segmentation approaches.
引用
收藏
页码:497 / 511
页数:15
相关论文
共 39 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]  
Cheng B., 2021, Advances in neural information processing systems, V34, P17864
[3]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289
[4]   (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network [J].
Cheng, Ran ;
Razani, Ryan ;
Taghavi, Ehsan ;
Li, Enxu ;
Liu, Bingbing .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12542-12551
[5]  
Cortinhal Tiago, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12510), P207, DOI 10.1007/978-3-030-64559-5_16
[6]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[7]  
Liong VE, 2020, Arxiv, DOI [arXiv:2012.04934, DOI 10.48550/ARXIV.2012.04934]
[8]   Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks [J].
Guo, Meng-Hao ;
Liu, Zheng-Ning ;
Mu, Tai-Jiang ;
Hu, Shi-Min .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) :5436-5447
[9]   PCT: Point cloud transformer [J].
Guo, Meng-Hao ;
Cai, Jun-Xiong ;
Liu, Zheng-Ning ;
Mu, Tai-Jiang ;
Martin, Ralph R. ;
Hu, Shi-Min .
COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) :187-199
[10]   Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [J].
Hou, Yuenan ;
Zhu, Xinge ;
Ma, Yuexin ;
Loy, Chen Change ;
Li, Yikang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8469-8478