Masked Structural Point Cloud Modeling to Learning 3D Representation

被引：0

作者：

Yamada, Ryosuke ^{[1
,2
]}

Tadokoro, Ryu ^{[2
]}

Qiu, Yue ^{[2
]}

Kataoka, Hirokatsu ^{[2
]}

Satoh, Yutaka ^{[1
,2
]}

机构：

[1] Univ Tsukuba, Grad Sch Sci & Technol, Tsukuba 3058577, Japan

[2] Natl Inst Adv Ind Sci & Technol, Tsukuba 3058560, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本学术振兴会;

关键词：

Deep learning; computer vision; 3D object recognition; point cloud; transfer learning; self-supervised learning; formula-driven supervised learning; NETWORK;

D O I：

10.1109/ACCESS.2024.3470971

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Pre-training for 3D object recognition typically requires a large-scale 3D dataset to learn effective 3D geometric representations. However, constructing such datasets is costly due to the extensive 3D data collection and human annotation required. This paper explores a synthetic pre-training approach that learns 3D geometric representations by reconstructing structural point clouds without relying on real data or human annotation. We propose the Point Cloud Perlin Noise (PCPN) dataset, which is an automatically generated point cloud dataset that uses Perlin noise to simulate natural 3D structures found in the real world. The proposed method enables the rapid generation of diverse 3D geometric patterns using a simple Perlin noise-based formula, significantly reducing the human effort typically involved in creating conventional 3D datasets. We applied PointMAE to the PCPN dataset for pre-training, demonstrating improved performance in downstream tasks such as 3D shape classification and part segmentation. Our experiments showed that the proposed pre-trained model outperformed a model trained from scratch on ModelNet40 by 1.4%. In addition, our pre-training strategy proves effective for 3D object recognition without requiring real data or supervised labels. This study highlights that Perlin noise can capture 3D structural properties and that the diversity of geometric patterns is crucial for learning effective 3D geometric representations.

引用

页码：142291 / 142305

页数：15

共 54 条

[1]

Asano Y. M., 2021, P NEURIPS TRACK DAT

[2]

Baradad Manel, 2021, ADV NEUR IN, V34

[3] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[4]

Chen G., 2024, P ADV NEUR INF PROC, V36

[5]

Chen T, 2020, PR MACH LEARN RES, V119

[6] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[7] PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis [J].

Cheng, Silin ;

Chen, Xiwu ;

He, Xinwei ;

Liu, Zhe ;

Bai, Xiang .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4436-4448

[8] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

← 1 2 3 4 5 6 →