Masked Autoencoders in 3D Point Cloud Representation Learning

被引:4
|
作者
Jiang, Jincen [1 ]
Lu, Xuequan [2 ]
Zhao, Lizhi [1 ]
Dazeley, Richard [3 ]
Wang, Meili [1 ]
机构
[1] NorthWest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic 3000, Australia
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
关键词
Point cloud compression; Transformers; Task analysis; Feature extraction; Three-dimensional displays; Solid modeling; Decoding; Self-supervised learning; point cloud; completion; NETWORK;
D O I
10.1109/TMM.2023.3314973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this article, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively).
引用
收藏
页码:820 / 831
页数:12
相关论文
共 50 条
  • [1] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
    Lin, Chengxing
    Xu, Wenju
    Zhu, Jian
    Nie, Yongwei
    Cai, Ruichu
    Xu, Xuemiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
  • [2] Rethinking Masked Representation Learning for 3D Point Cloud Understanding
    Wang, Chuxin
    Zha, Yixin
    He, Jianfeng
    Yang, Wenfei
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 247 - 262
  • [3] Masked Structural Point Cloud Modeling to Learning 3D Representation
    Yamada, Ryosuke
    Tadokoro, Ryu
    Qiu, Yue
    Kataoka, Hirokatsu
    Satoh, Yutaka
    IEEE ACCESS, 2024, 12 : 142291 - 142305
  • [4] Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis
    Zhang, Qijian
    Hou, Junhui
    Qian, Yue
    Zeng, Yiming
    Zhang, Juyong
    He, Ying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9726 - 9742
  • [5] Comprehensive Review of Deep Learning-Based 3D Point Cloud Completion Processing and Analysis
    Fei, Ben
    Yang, Weidong
    Chen, Wen-Ming
    Li, Zhijun
    Li, Yikang
    Ma, Tao
    Hu, Xing
    Ma, Lipeng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 22862 - 22883
  • [6] Learning 3D Shape Latent for Point Cloud Completion
    Chen, Zhikai
    Long, Fuchen
    Qiu, Zhaofan
    Yao, Ting
    Zhou, Wengang
    Luo, Jiebo
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8717 - 8729
  • [7] T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning
    Wei, Weijie
    Nejadasl, Fatemeh Karimi
    Gevers, Theo
    Oswald, Martin R.
    COMPUTER VISION - ECCV 2024, PT XI, 2025, 15069 : 178 - 195
  • [8] LPD-AE: Latent Space Representation of Large-Scale 3D Point Cloud
    Suo, Chuanzhe
    Liu, Zhe
    Mo, Lingfei
    Liu, Yunhui
    IEEE ACCESS, 2020, 8 : 108402 - 108417
  • [9] LinK3D: Linear Keypoints Representation for 3D LiDAR Point Cloud
    Cui, Yunge
    Zhang, Yinlong
    Dong, Jiahua
    Sun, Haibo
    Chen, Xieyuanli
    Zhu, Feng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) : 2128 - 2135
  • [10] Feature extraction and representation learning of 3D point cloud data
    Si, Hongying
    Wei, Xianyong
    IMAGE AND VISION COMPUTING, 2024, 142