MaskLRF: Self-Supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-Invariant 3D Point Set Analysis

被引：0

作者：

Furuya, Takahiko ^{[1
]}

机构：

[1] Univ Yamanashi, Integrated Grad Sch Med Engn & Agr Sci, Kofu, Yamanashi 4008511, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本学术振兴会;

关键词：

Three-dimensional displays; Task analysis; Shape; Transformers; Feature extraction; Solid modeling; Encoding; Point cloud compression; Representation learning; Self-supervised learning; 3D point cloud; deep learning; masked autoencoding; representation learning; self-supervised pretraining;

D O I：

10.1109/ACCESS.2024.3404016

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Following the successes in the fields of vision and language, self-supervised pretraining via masked autoencoding of 3D point set data, or Masked Point Modeling (MPM), has achieved state-of-the-art accuracy in various downstream tasks. However, current MPM methods lack a property essential for 3D point set analysis, namely, invariance against rotation of 3D objects/scenes. Existing MPM methods are thus not necessarily suitable for real-world applications where 3D point sets may have inconsistent orientations. This paper develops, for the first time, a rotation-invariant self-supervised pretraining framework for practical 3D point set analysis. The proposed algorithm, called MaskLRF, learns rotation-invariant and highly generalizable latent features via masked autoencoding of 3D points within Local Reference Frames (LRFs), which are not affected by rotation of 3D point sets. MaskLRF enhances the quality of latent features by integrating feature refinement using relative pose encoding and feature reconstruction using low-level but rich 3D geometry. The efficacy of MaskLRF is validated via extensive experiments on diverse downstream tasks including classification, segmentation, registration, and domain adaptation. The experiments demonstrate that MaskLRF achieves new state-of-the-art accuracies in analyzing 3D point sets having inconsistent orientations. Code will be available at: https://github.com/takahikof/MaskLRF.

引用

页码：73340 / 73353

页数：14

共 81 条

[1] Self-Supervised Learning for Domain Adaptation on Point Clouds
Achituve, Idan
Maron, Haggai
Chechik, Gal
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 123 - 133
[2] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Afham, Mohamed
Dissanayake, Isuru
Dissanayake, Dinithi
Dharmasiri, Amaya
Thilakarathna, Kanchana
Rodrigo, Ranga
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
[3] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[4] ClusterNet: Deep Hierarchical Cluster Network with Rigorously Rotation-Invariant Representation for Point Cloud Analysis
Chen, Chao
Li, Guanbin
Xu, Ruijia
Chen, Tianshui
Wang, Meng
Lin, Liang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4989 - 4997
[5] The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution
Chen, Ronghan
Cong, Yang
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7462 - 7471
[6] Chen Zibin, 2023, IEEE INFOCOM 2023 - IEEE Conference on Computer Communications, P1, DOI 10.1109/INFOCOM53939.2023.10228930
[7] Cheng G, 2023, PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL ART, MUSIC, MODELLING, AND DESIGN, FARM 2023, P1, DOI 10.1145/3609023.3615582
[8] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
Dai, Angela
Qi, Charles Ruizhongtai
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
[9] Vector Neurons: A General Framework for SO(3)-Equivariant Networks
Deng, Congyue
Litany, Or
Duan, Yueqi
Poulenard, Adrien
Tagliasacchi, Andrea
Guibas, Leonidas
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12180 - 12189
[10] PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
Deng, Haowen
Birdal, Tolga
Ilic, Slobodan
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 620 - 638

← 1 2 3 4 5 6 7 8 9 →