D2T-Net: A dual-domain transformer network exploiting spatial and channel dimensions for semantic segmentation of urban mobile laser scanning point clouds

被引:1
|
作者
Luo, Ziwei [1 ]
Zeng, Ziyin [2 ]
Wan, Jie [3 ]
Tang, Wei [4 ]
Jin, Zhongge [4 ]
Xie, Zhong [4 ]
Xu, Yongyang [4 ,5 ,6 ]
机构
[1] China Univ Geosci, Sch Geog & Informat Engn, Wuhan 430074, Peoples R China
[2] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China
[3] China Univ Geosci, Key Lab Geol & Evaluat, Minist Educ, Wuhan, Peoples R China
[4] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[5] State Key Lab Geoinformat Engn, Xian 710054, Peoples R China
[6] Guangdong Hong Kong Macau Joint Lab Smart Cities, Shenzhen 518000, Peoples R China
关键词
Mobile laser scanning point clouds; Semantic segmentation; Transformer; Deep learning; Urban scenes;
D O I
10.1016/j.jag.2024.104039
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Semantic segmentation is key in analyzing urban mobile laser scanning (MLS) point clouds. In recent years, the Transformer mechanism, known for capturing long-range contextual relationships, has attracted significant research attention in the field of 3D vision. However, computing global self-attention for 3D scenes incurs high computational costs and the loss of local details. Our work introduces a Dual Domain Transformer network (D2T-Net), adept at processing complex urban MLS point clouds. It operates efficiently in spatial and channel dimensions, enabling efficient semantic segmentation while maintaining detailed scene elements like small urban objects. We introduce a Local Spatial-wise Transformer (LST) block to enrich local semantics through an improved self-attention mechanism that incorporates relative embeddings and transfers spatial information across various representation subspaces in parallel. Based on a feature pyramid framework that fuses and refines features from LST, we introduce a Global Channel-wise Transformer (GCT) block, which efficiently captures global context by focusing on feature channel inter-relationships with a controlled flow gate employed for selective information transfer. D2T-Net utilizes Transformers in both spatial and channel domains to review and fuse features from multiple layers, effectively summarizing semantic contexts and enriching spatial details with multi-scale information. Experiments conducted on three challenging benchmark MLS datasets, the Oakland 3-D, Toronto-3D, and Paris-Lille-3D have confirmed D2T-Net's accuracy, achieving 98.2%, 83.9%, and 83.8% mIoU respectively.
引用
收藏
页数:14
相关论文
共 3 条
  • [1] Unsupervised scene adaptation for semantic segmentation of urban mobile laser scanning point clouds
    Luo, Haifeng
    Khoshelham, Kourosh
    Fang, Lina
    Chen, Chongcheng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 169 (169) : 253 - 267
  • [2] Spatial Attention-Based Kernel Point Convolution Network for Semantic Segmentation of Transmission Corridor Scenarios in Airborne Laser Scanning Point Clouds
    Zhou, Fangrong
    Wen, Gang
    Ma, Yi
    Pan, Hao
    Wang, Guofang
    Wang, Yifan
    ELECTRONICS, 2024, 13 (22)
  • [3] WSPointNet: A multi-branch weakly supervised learning network for semantic segmentation of large-scale mobile laser scanning point clouds
    Lei, Xiangda
    Guan, Haiyan
    Ma, Lingfei
    Yu, Yongtao
    Dong, Zhen
    Gao, Kyle
    Delavar, Mahmoud Reza
    Li, Jonathan
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 115