MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud

被引:9
|
作者
Guo, Bo [1 ]
Deng, Liwei [2 ]
Wang, Ruisheng [3 ,4 ]
Guo, Wenchao [1 ]
Ng, Alex Hay-Man [1 ]
Bai, Wenfeng [5 ]
机构
[1] Guangdong Univ Technol, Sch Civil & Transportat Engn, Guangzhou 510006, Peoples R China
[2] Guilin Univ Technol, Coll Geomatics & Geoinformat, Guilin 541004, Peoples R China
[3] Shenzhen Univ, Sch Architecture & Urban Planning, Shenzhen 518060, Peoples R China
[4] Univ Calgary, Fac Schulich, Sch Engn, Calgary, AB T2N 1N4, Canada
[5] Res Inst Co Ltd, Geotech Branch Guangzhou Metro Design, Guangzhou 510010, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Cross-attention; long-range dependency; point cloud; segmentation; CLASSIFICATION;
D O I
10.1109/TGRS.2023.3322579
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In this work, we implement a hybrid method to utilize sufficient information by aggregating both fine-grained and globally contextual features for point cloud semantic segmentation with a hierarchical network. By surpassing the defects of convolution operation mainly for extracting low-level features, we combine higher level cross-attention-based transformers to investigate the importance of long-range relations together with position embedding for multiscale feature representation. Specifically, by adding a learnable token to the feature sequence of a layer, a transformer encoder is first implemented with limited scope to embed these features. Furthermore, instead of performing all-to-all attention, we merely fuse tokens spanning various scales. To improve efficiency, we propose a simple yet efficient token-fusing architecture based on cross-attention, in which the computation of attention maps can be restricted within linear time by only using a token to calculate the query. The cross-attention module can be efficiently aggregated in a multiscale network to further enlarge the scope of the receptive field for attention. Experiments show that our multiscale cross-attention-based transformer network (MCTNet) achieves promising results on the three largest point cloud datasets, DALES, DublinCity, and S3DIS datasets. For the DALES benchmark dataset, MCTNet improves the mean intersection-over-union (mIoU) to 83.3% and the overall accuracy (OA) to 98.3%, which outperforms other existing baselines. We also perform abundant ablation studies on various attention and normalization modules and discuss the effect of parameters to validate the descriptive power of cross-attention modules and provide an understanding of how long-range dependency can be used to learn fair and unbiased features.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] BushNet: Effective semantic segmentation of bush in large-scale point clouds
    Wei, Hejun
    Xu, Enyong
    Zhang, Jinlai
    Meng, Yanmei
    Wei, Jin
    Dong, Zhen
    Li, Zhengqiang
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 193
  • [32] Weakly Supervised Large-Scale Point Cloud Semantic Segmentation Based on Dual Consistency Constraints and Uncertainty-Aware Fusion
    Zhou, Ce
    Ling, Qiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [33] A Fast and Accurate Segmentation Method for Ordered LiDAR Point Cloud of Large-Scale Scenes
    Zhou, Ying
    Wang, Dan
    Xie, Xiang
    Ren, Yiyi
    Li, Guolin
    Deng, Yangdong
    Wang, Zhihua
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (11) : 1981 - 1985
  • [34] NeiEA-NET: Semantic segmentation of large-scale point cloud scene via neighbor enhancement and aggregation
    Xu, Yongyang
    Tang, Wei
    Zeng, Ziyin
    Wu, Weichao
    Wan, Jie
    Guo, Han
    Xie, Zhong
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 119
  • [35] Swin Transformer-Based Multiscale Attention Model for Landslide Extraction From Large-Scale Area
    Gao, Mengjie
    Chen, Fang
    Wang, Lei
    Zhao, Huichen
    Yu, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [36] 3CROSSNet: Cross-Level Cross-Scale Cross-Attention Network for Point Cloud Representation
    Han, Xian-Feng
    He, Zhang-Yue
    Chen, Jia
    Xiao, Guo-Qiang
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3718 - 3725
  • [37] Multilevel intuitive attention neural network for airborne LiDAR point cloud semantic segmentation
    Wang, Ziyang
    Chen, Hui
    Liu, Jing
    Qin, Jiarui
    Sheng, Yehua
    Yang, Lin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 132
  • [38] Context-Aware Network for Semantic Segmentation Toward Large-Scale Point Clouds in Urban Environments
    Liu, Chun
    Zeng, Doudou
    Akbar, Akram
    Wu, Hangbin
    Jia, Shoujun
    Xu, Zeran
    Yue, Han
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [39] Attention-based relation and context modeling for point cloud semantic segmentation
    Hu, Zhiyu
    Zhang, Dongbo
    Li, Shuai
    Qin, Hong
    COMPUTERS & GRAPHICS-UK, 2020, 90 : 126 - 134
  • [40] Joint Segmentation of Images and Scanned Point Cloud in Large-Scale Street Scenes With Low-Annotation Cost
    Zhang, Honghui
    Wang, Jinglu
    Fang, Tian
    Quan, Long
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (11) : 4763 - 4772