MaskLRF: Self-Supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-Invariant 3D Point Set Analysis

被引:0
作者
Furuya, Takahiko [1 ]
机构
[1] Univ Yamanashi, Integrated Grad Sch Med Engn & Agr Sci, Kofu, Yamanashi 4008511, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Three-dimensional displays; Task analysis; Shape; Transformers; Feature extraction; Solid modeling; Encoding; Point cloud compression; Representation learning; Self-supervised learning; 3D point cloud; deep learning; masked autoencoding; representation learning; self-supervised pretraining;
D O I
10.1109/ACCESS.2024.3404016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Following the successes in the fields of vision and language, self-supervised pretraining via masked autoencoding of 3D point set data, or Masked Point Modeling (MPM), has achieved state-of-the-art accuracy in various downstream tasks. However, current MPM methods lack a property essential for 3D point set analysis, namely, invariance against rotation of 3D objects/scenes. Existing MPM methods are thus not necessarily suitable for real-world applications where 3D point sets may have inconsistent orientations. This paper develops, for the first time, a rotation-invariant self-supervised pretraining framework for practical 3D point set analysis. The proposed algorithm, called MaskLRF, learns rotation-invariant and highly generalizable latent features via masked autoencoding of 3D points within Local Reference Frames (LRFs), which are not affected by rotation of 3D point sets. MaskLRF enhances the quality of latent features by integrating feature refinement using relative pose encoding and feature reconstruction using low-level but rich 3D geometry. The efficacy of MaskLRF is validated via extensive experiments on diverse downstream tasks including classification, segmentation, registration, and domain adaptation. The experiments demonstrate that MaskLRF achieves new state-of-the-art accuracies in analyzing 3D point sets having inconsistent orientations. Code will be available at: https://github.com/takahikof/MaskLRF.
引用
收藏
页码:73340 / 73353
页数:14
相关论文
共 81 条
  • [1] Self-Supervised Learning for Domain Adaptation on Point Clouds
    Achituve, Idan
    Maron, Haggai
    Chechik, Gal
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 123 - 133
  • [2] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [3] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [4] ClusterNet: Deep Hierarchical Cluster Network with Rigorously Rotation-Invariant Representation for Point Cloud Analysis
    Chen, Chao
    Li, Guanbin
    Xu, Ruijia
    Chen, Tianshui
    Wang, Meng
    Lin, Liang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4989 - 4997
  • [5] The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution
    Chen, Ronghan
    Cong, Yang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7462 - 7471
  • [6] Chen Zibin, 2023, IEEE INFOCOM 2023 - IEEE Conference on Computer Communications, P1, DOI 10.1109/INFOCOM53939.2023.10228930
  • [7] Cheng G, 2023, PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL ART, MUSIC, MODELLING, AND DESIGN, FARM 2023, P1, DOI 10.1145/3609023.3615582
  • [8] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [9] Vector Neurons: A General Framework for SO(3)-Equivariant Networks
    Deng, Congyue
    Litany, Or
    Duan, Yueqi
    Poulenard, Adrien
    Tagliasacchi, Andrea
    Guibas, Leonidas
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12180 - 12189
  • [10] PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
    Deng, Haowen
    Birdal, Tolga
    Ilic, Slobodan
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 620 - 638