MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud

被引:9
|
作者
Guo, Bo [1 ]
Deng, Liwei [2 ]
Wang, Ruisheng [3 ,4 ]
Guo, Wenchao [1 ]
Ng, Alex Hay-Man [1 ]
Bai, Wenfeng [5 ]
机构
[1] Guangdong Univ Technol, Sch Civil & Transportat Engn, Guangzhou 510006, Peoples R China
[2] Guilin Univ Technol, Coll Geomatics & Geoinformat, Guilin 541004, Peoples R China
[3] Shenzhen Univ, Sch Architecture & Urban Planning, Shenzhen 518060, Peoples R China
[4] Univ Calgary, Fac Schulich, Sch Engn, Calgary, AB T2N 1N4, Canada
[5] Res Inst Co Ltd, Geotech Branch Guangzhou Metro Design, Guangzhou 510010, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Cross-attention; long-range dependency; point cloud; segmentation; CLASSIFICATION;
D O I
10.1109/TGRS.2023.3322579
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In this work, we implement a hybrid method to utilize sufficient information by aggregating both fine-grained and globally contextual features for point cloud semantic segmentation with a hierarchical network. By surpassing the defects of convolution operation mainly for extracting low-level features, we combine higher level cross-attention-based transformers to investigate the importance of long-range relations together with position embedding for multiscale feature representation. Specifically, by adding a learnable token to the feature sequence of a layer, a transformer encoder is first implemented with limited scope to embed these features. Furthermore, instead of performing all-to-all attention, we merely fuse tokens spanning various scales. To improve efficiency, we propose a simple yet efficient token-fusing architecture based on cross-attention, in which the computation of attention maps can be restricted within linear time by only using a token to calculate the query. The cross-attention module can be efficiently aggregated in a multiscale network to further enlarge the scope of the receptive field for attention. Experiments show that our multiscale cross-attention-based transformer network (MCTNet) achieves promising results on the three largest point cloud datasets, DALES, DublinCity, and S3DIS datasets. For the DALES benchmark dataset, MCTNet improves the mean intersection-over-union (mIoU) to 83.3% and the overall accuracy (OA) to 98.3%, which outperforms other existing baselines. We also perform abundant ablation studies on various attention and normalization modules and discuss the effect of parameters to validate the descriptive power of cross-attention modules and provide an understanding of how long-range dependency can be used to learn fair and unbiased features.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] A Dual Attention Neural Network for Airborne LiDAR Point Cloud Semantic Segmentation
    Zhang, Ka
    Ye, Longjie
    Xiao, Wen
    Sheng, Yehua
    Zhang, Shan
    Tao, Xia
    Zhou, Yaqin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [22] Multistage Scene-Level Constraints for Large-Scale Point Cloud Weakly Supervised Semantic Segmentation
    Su, Yanfei
    Cheng, Ming
    Yuan, Zhimin
    Liu, Weiquan
    Zeng, Wankang
    Wang, Cheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [23] Enhanced Local Feature Learning With Simple Offset Attention for Semantic Segmentation of Large-Scale Point Clouds
    Chen, Dong
    Wang, Yuebin
    Zhang, Liqiang
    Kang, Zhizhong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [24] MHNet: Multiscale Hierarchical Network for 3D Point Cloud Semantic Segmentation
    Liang, Xiaoli
    Fu, Zhongliang
    IEEE ACCESS, 2019, 7 : 173999 - 174012
  • [25] Active Spatio-Fine Enhancement Network for Semantic Segmentation of Large-Scale Point Clouds
    Chen, Xijiang
    Wang, Zihao
    Zhao, Bufan
    Qin, Mengjiao
    Han, Xianquan
    Ozdemir, Emirhan
    IEEE SENSORS JOURNAL, 2024, 24 (22) : 37358 - 37379
  • [26] Advancements in Semantic Segmentation Methods for Large-Scale Point Clouds Based on Deep Learning
    Ai Da
    Zhang Xiaoyang
    Xu Ce
    Qin Siyu
    Yuan Hui
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (12)
  • [27] MNAT-Net: Multi-Scale Neighborhood Aggregation Transformer Network for Point Cloud Classification and Segmentation
    Wang, Xuchu
    Yuan, Yue
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9153 - 9167
  • [28] Multilevel Geometric Feature Embedding in Transformer Network for ALS Point Cloud Semantic Segmentation
    Liang, Zhuanxin
    Lai, Xudong
    REMOTE SENSING, 2024, 16 (18)
  • [29] A dual projection method for semantic segmentation of large-scale point clouds
    Zhao, Haoying
    Zhou, Aimin
    VISUAL COMPUTER, 2025,
  • [30] GSIP: Green Semantic Segmentation of Large-Scale Indoor Point Clouds
    Zhang, Min
    Kadam, Pranav
    Liu, Shan
    Kuo, C. -C. Jay
    PATTERN RECOGNITION LETTERS, 2022, 164 : 9 - 15