MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud

被引:9
|
作者
Guo, Bo [1 ]
Deng, Liwei [2 ]
Wang, Ruisheng [3 ,4 ]
Guo, Wenchao [1 ]
Ng, Alex Hay-Man [1 ]
Bai, Wenfeng [5 ]
机构
[1] Guangdong Univ Technol, Sch Civil & Transportat Engn, Guangzhou 510006, Peoples R China
[2] Guilin Univ Technol, Coll Geomatics & Geoinformat, Guilin 541004, Peoples R China
[3] Shenzhen Univ, Sch Architecture & Urban Planning, Shenzhen 518060, Peoples R China
[4] Univ Calgary, Fac Schulich, Sch Engn, Calgary, AB T2N 1N4, Canada
[5] Res Inst Co Ltd, Geotech Branch Guangzhou Metro Design, Guangzhou 510010, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-attention; long-range dependency; point cloud; segmentation; CLASSIFICATION;
D O I
10.1109/TGRS.2023.3322579
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
In this work, we implement a hybrid method to utilize sufficient information by aggregating both fine-grained and globally contextual features for point cloud semantic segmentation with a hierarchical network. By surpassing the defects of convolution operation mainly for extracting low-level features, we combine higher level cross-attention-based transformers to investigate the importance of long-range relations together with position embedding for multiscale feature representation. Specifically, by adding a learnable token to the feature sequence of a layer, a transformer encoder is first implemented with limited scope to embed these features. Furthermore, instead of performing all-to-all attention, we merely fuse tokens spanning various scales. To improve efficiency, we propose a simple yet efficient token-fusing architecture based on cross-attention, in which the computation of attention maps can be restricted within linear time by only using a token to calculate the query. The cross-attention module can be efficiently aggregated in a multiscale network to further enlarge the scope of the receptive field for attention. Experiments show that our multiscale cross-attention-based transformer network (MCTNet) achieves promising results on the three largest point cloud datasets, DALES, DublinCity, and S3DIS datasets. For the DALES benchmark dataset, MCTNet improves the mean intersection-over-union (mIoU) to 83.3% and the overall accuracy (OA) to 98.3%, which outperforms other existing baselines. We also perform abundant ablation studies on various attention and normalization modules and discuss the effect of parameters to validate the descriptive power of cross-attention modules and provide an understanding of how long-range dependency can be used to learn fair and unbiased features.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] RAAFNet: Reverse Attention Adaptive Fusion Network for Large-Scale Point Cloud Semantic Segmentation
    Wang, Kai
    Zhang, Huanhuan
    MATHEMATICS, 2024, 12 (16)
  • [2] Radial Transformer for Large-Scale Outdoor LiDAR Point Cloud Semantic Segmentation
    He, Xiang
    Li, Xu
    Ni, Peizhou
    Xu, Wang
    Xu, Qimin
    Liu, Xixiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [3] PointNAT: Large-Scale Point Cloud Semantic Segmentation via Neighbor Aggregation With Transformer
    Zeng, Ziyin
    Qiu, Huan
    Zhou, Jian
    Dong, Zhen
    Xiao, Jinsheng
    Li, Bijun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
  • [4] Semantic segmentation of large-scale point clouds by integrating attention mechanisms and transformer models
    Yuan, Tiebiao
    Yu, Yangyang
    Wang, Xiaolong
    IMAGE AND VISION COMPUTING, 2024, 146
  • [5] A Large-Scale Network Construction and Lightweighting Method for Point Cloud Semantic Segmentation
    Han, Jiawei
    Liu, Kaiqi
    Li, Wei
    Chen, Guangzhi
    Wang, Wenguang
    Zhang, Feng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2004 - 2017
  • [6] Dense Dual-Branch Cross Attention Network for Semantic Segmentation of Large-Scale Point Clouds
    Luo, Ziwei
    Zeng, Ziyin
    Tang, Wei
    Wan, Jie
    Xie, Zhong
    Xu, Yongyang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [7] TCFAP-Net: Transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation
    Zhang, Jianjun
    Jiang, Zhipeng
    Qiu, Qinjun
    Liu, Zheng
    PATTERN RECOGNITION, 2024, 154
  • [8] MPT-Net: Mask Point Transformer Network for Large Scale Point Cloud Semantic Segmentation
    Tang, Zhe Jun
    Cham, Tat-Jen
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 10611 - 10618
  • [9] Point and voxel cross perception with lightweight cosformer for large-scale point cloud semantic segmentation
    Zhang, Shuai
    Wang, Biao
    Chen, Yiping
    Zhang, Shuhang
    Zhang, Wuming
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 131
  • [10] Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud
    Zhang, Yachao
    Li, Zonghao
    Xie, Yuan
    Qu, Yanyun
    Li, Cuihua
    Mei, Tao
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3421 - 3429