3DGTN: 3-D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation

被引:6
作者
Lu, Dening [1 ]
Gao, Kyle [1 ]
Xie, Qian [2 ]
Xu, Linlin [1 ]
Li, Jonathan [3 ,4 ]
机构
[1] Univ Waterloo, Dept Syst Design Engn, Waterloo, ON N2L 3G1, Canada
[2] Univ Oxford, Dept Comp Sci, Oxford OX1 3QD, England
[3] Univ Waterloo, Dept Syst Design Engn, Waterloo N2L 3G1, ON, Canada
[4] Univ Waterloo, Dept Geog & Environm Management, Waterloo, ON N2L 3G1, Canada
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
加拿大自然科学与工程研究理事会;
关键词
Point cloud compression; Feature extraction; Three-dimensional displays; Decoding; Task analysis; Representation learning; Graph convolution; LiDAR data processing; point cloud classification; point cloud segmentation; self-attention mechanism; transformer;
D O I
10.1109/TGRS.2024.3393845
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Although the application of Transformers to 3-D point cloud processing has achieved significant progress and success, it is still challenging for existing 3-D Transformer methods to efficiently and accurately learn both valuable global and local features for improved applications. This article presents a novel point cloud representational learning network, called 3-D Dual Self-attention global local (GLocal) Transformer Network (3DGTN), for improved feature learning in both classification and segmentation tasks, with the following key contributions. First, a GLocal feature learning (GFL) block with the dual self-attention mechanism [i.e., a novel point-patch self-attention, called PPSA, and a channel-wise self-attention (CSA)] is designed to efficiently learn the global and local context information. Second, the GFL block is integrated with a multiscale Graph Convolution-based local feature aggregation (LFA) block, leading to a GLocal information extraction module that can efficiently capture critical information. Third, a series of GLocal modules are used to construct a new hierarchical encoder-decoder structure to enable the learning of information in different scales in a hierarchical manner. The proposed framework is evaluated on both classification and segmentation datasets, demonstrating that the proposed method is capable of outperforming many state-of-the-art methods on both synthetic and LiDAR data. Our code has been released at https://github.com/d62lu/3DGTN.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 60 条
  • [1] [Anonymous], 2021, Int. J. Appl.Earth Observ. Geoinf., V105
  • [2] Atzmon M, 2018, Arxiv, DOI arXiv:1803.10091
  • [3] Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition
    Berg, Axel
    Oskarsson, Magnus
    O'Connor, Mark
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 528 - 534
  • [4] PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis
    Cheng, Silin
    Chen, Xiwu
    He, Xinwei
    Liu, Zhe
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4436 - 4448
  • [5] Point Transformer
    Engel, Nico
    Belagiannis, Vasileios
    Dietmayer, Klaus
    [J]. IEEE ACCESS, 2021, 9 : 134826 - 134840
  • [6] Point attention network for semantic segmentation of 3D point clouds
    Feng, Mingtao
    Zhang, Liang
    Lin, Xuefei
    Gilani, Syed Zulqarnain
    Mian, Ajmal
    [J]. PATTERN RECOGNITION, 2020, 107 (107)
  • [7] LFT-Net: Local Feature Transformer Network for Point Clouds Analysis
    Gao, Yongbin
    Liu, Xuebing
    Li, Jun
    Fang, Zhijun
    Jiang, Xiaoyan
    Huq, Kazi Mohammed Saidul
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) : 2158 - 2168
  • [8] Goyal A, 2021, PR MACH LEARN RES, V139
  • [9] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
    Graham, Benjamin
    Engelcke, Martin
    van der Maaten, Laurens
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9224 - 9232
  • [10] PCT: Point cloud transformer
    Guo, Meng-Hao
    Cai, Jun-Xiong
    Liu, Zheng-Ning
    Mu, Tai-Jiang
    Martin, Ralph R.
    Hu, Shi-Min
    [J]. COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) : 187 - 199