D2T-Net: A dual-domain transformer network exploiting spatial and channel dimensions for semantic segmentation of urban mobile laser scanning point clouds

被引：1

作者：

Luo, Ziwei ^{[1
]}

Zeng, Ziyin ^{[2
]}

Wan, Jie ^{[3
]}

Tang, Wei ^{[4
]}

Jin, Zhongge ^{[4
]}

Xie, Zhong ^{[4
]}

Xu, Yongyang ^{[4
,5
,6
]}

机构：

[1] China Univ Geosci, Sch Geog & Informat Engn, Wuhan 430074, Peoples R China

[2] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China

[3] China Univ Geosci, Key Lab Geol & Evaluat, Minist Educ, Wuhan, Peoples R China

[4] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China

[5] State Key Lab Geoinformat Engn, Xian 710054, Peoples R China

[6] Guangdong Hong Kong Macau Joint Lab Smart Cities, Shenzhen 518000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION | 2024年 / 132卷

关键词：

Mobile laser scanning point clouds; Semantic segmentation; Transformer; Deep learning; Urban scenes;

D O I：

10.1016/j.jag.2024.104039

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Semantic segmentation is key in analyzing urban mobile laser scanning (MLS) point clouds. In recent years, the Transformer mechanism, known for capturing long-range contextual relationships, has attracted significant research attention in the field of 3D vision. However, computing global self-attention for 3D scenes incurs high computational costs and the loss of local details. Our work introduces a Dual Domain Transformer network (D2T-Net), adept at processing complex urban MLS point clouds. It operates efficiently in spatial and channel dimensions, enabling efficient semantic segmentation while maintaining detailed scene elements like small urban objects. We introduce a Local Spatial-wise Transformer (LST) block to enrich local semantics through an improved self-attention mechanism that incorporates relative embeddings and transfers spatial information across various representation subspaces in parallel. Based on a feature pyramid framework that fuses and refines features from LST, we introduce a Global Channel-wise Transformer (GCT) block, which efficiently captures global context by focusing on feature channel inter-relationships with a controlled flow gate employed for selective information transfer. D2T-Net utilizes Transformers in both spatial and channel domains to review and fuse features from multiple layers, effectively summarizing semantic contexts and enriching spatial details with multi-scale information. Experiments conducted on three challenging benchmark MLS datasets, the Oakland 3-D, Toronto-3D, and Paris-Lille-3D have confirmed D2T-Net's accuracy, achieving 98.2%, 83.9%, and 83.8% mIoU respectively.

引用

页数：14

共 3 条

[1] Unsupervised scene adaptation for semantic segmentation of urban mobile laser scanning point clouds
Luo, Haifeng
Khoshelham, Kourosh
Fang, Lina
Chen, Chongcheng
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 169 (169) : 253 - 267
[2] Spatial Attention-Based Kernel Point Convolution Network for Semantic Segmentation of Transmission Corridor Scenarios in Airborne Laser Scanning Point Clouds
Zhou, Fangrong
Wen, Gang
Ma, Yi
Pan, Hao
Wang, Guofang
Wang, Yifan
ELECTRONICS, 2024, 13 (22)
[3] WSPointNet: A multi-branch weakly supervised learning network for semantic segmentation of large-scale mobile laser scanning point clouds
Lei, Xiangda
Guan, Haiyan
Ma, Lingfei
Yu, Yongtao
Dong, Zhen
Gao, Kyle
Delavar, Mahmoud Reza
Li, Jonathan
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 115

← 1 →