CSFNet: Cross-Modal Semantic Focus Network for Semantic Segmentation of Large-Scale Point Clouds

被引：3

作者：

Luo, Yang ^{[1
]}

Han, Ting ^{[2
]}

Liu, Yujun ^{[3
]}

Su, Jinhe ^{[1
]}

Chen, Yiping ^{[2
]}

Li, Jinyuan ^{[1
]}

Wu, Yundong ^{[1
]}

Cai, Guorong ^{[1
]}

机构：

[1] Jimei Univ, Sch Comp Engn, Xiamen 361021, Peoples R China

[2] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai 519082, Peoples R China

[3] Shenzhen Univ, Sch Architecture & Urban Planning, Shenzhen 518061, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2025年 / 63卷

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Laser radar; Three-dimensional displays; Semantics; Feature extraction; Contrastive learning; Semantic segmentation; Roads; Transformers; Image color analysis; Constrastive learning; point clouds; semantic focus; semantic segmentation; urban scenes;

D O I：

10.1109/TGRS.2025.3535800

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Semantic segmentation of large-scale point clouds is an indispensable component of outdoor scene perception, providing essential 3-D semantic insights for applications in scene reconstruction, urban planning, autonomous driving, and more. However, the discriminative capability of point clouds features declines with increasing distance from the sensor, causing current methods to usually perform poorly in segmenting distant objects. To overcome this challenge and improve the differentiation between classes with similar geometric features, we propose the cross-modal semantic focus network (CSFNet). Firstly, we design a multiscale feature dynamic fusion (MDF) module to leverage multiscale image features, thereby enriching the feature representation of point clouds with additional images color and texture information. Then, in order to extract the distinguishing features of distant and different categories of objects more efficiently, we propose a semantic focus module (SFM) that employs a multiclass contrastive learning strategy to enhance feature discrimination. Finally, we introduce cross-modal knowledge distillation (KD) to augment the model's comprehension of point clouds. Extensive experiments conducted on the SemanticKITTI and nuScenes datasets demonstrate the effectiveness of our method. Notably, our method achieves superior segmentation accuracy across multiple classes at various distances compared to current methods.

引用

页数：15

共 60 条

[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [J].

Afham, Mohamed ;

Dissanayake, Isuru ;

Dissanayake, Dinithi ;

Dharmasiri, Amaya ;

Thilakarathna, Kanchana ;

Rodrigo, Ranga .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9892-9902

[2] RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving [J].

Ando, Angelika ;

Gidaris, Spyros ;

Bursuc, Andrei ;

Puy, Gilles ;

Boulch, Alexandre ;

Marlet, Renaud .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :5240-5250

[3] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[4] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[5] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[6] CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP [J].

Chen, Runnan ;

Liu, Youquan ;

Kong, Lingdong ;

Zhu, Xinge ;

Ma, Yuexin ;

Li, Yikang ;

Hou, Yuenan ;

Qiao, Yu ;

Wang, Wenping .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :7020-7030

[7] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network [J].

Cheng, Ran ;

Razani, Ryan ;

Taghavi, Ehsan ;

Li, Enxu ;

Liu, Bingbing .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12542-12551

[8] 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].

Choy, Christopher ;

Gwak, JunYoung ;

Savarese, Silvio .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079

[9]

Cortinhal Tiago, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12510), P207, DOI 10.1007/978-3-030-64559-5_16

[10]

Dai M., 2024, IEEE Trans. Geosci. Remote Sens., V62, P1

← 1 2 3 4 5 6 →