MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

被引：15

作者：

Liang, Yaqian ^{[1
]}

Zhao, Shanshan ^{[2
]}

Yu, Baosheng ^{[3
]}

Zhang, Jing ^{[3
]}

He, Fazhi ^{[1
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China

[2] JD Explore Acad, Beijing, Peoples R China

[3] Univ Sydney, Sch Comp Sci, Sydney, NSW, Australia

来源：

COMPUTER VISION - ECCV 2022, PT III | 2022年 / 13663卷

基金：

中国国家自然科学基金;

关键词：

Transformer; Masked autoencoding; 3D mesh analysis; Self-supervised pre-training;

D O I：

10.1007/978-3-031-20062-5_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. Since applying Transformer architectures to new modalities is usually non-trivial, we first adapt Vision Transformer to 3D mesh data processing, i.e., Mesh Transformer. In specific, we divide a mesh into several non-overlapping local patches with each containing the same number of faces and use the 3D position of each patch's center point to form positional embeddings. Inspired by MAE, we explore how pre-training on 3D mesh data with the Transformer-based structure benefits downstream 3D mesh analysis tasks. We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers. Then, through reconstructing the information of masked patches, the network is capable of learning discriminative representations for mesh data. Therefore, we name our method MeshMAE, which can yield state-of-the-art or comparable performance on mesh analysis tasks, i.e., classification and segmentation. In addition, we also conduct comprehensive ablation studies to show the effectiveness of key designs in our method.

引用

页码：37 / 54

页数：18

共 50 条

[1] Masked Autoencoders in 3D Point Cloud Representation Learning
Jiang, Jincen
Lu, Xuequan
Zhao, Lizhi
Dazeley, Richard
Wang, Meili
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 820 - 831
[2] Variational Autoencoders for Deforming 3D Mesh Models
Tan, Qingyang
Gao, Lin
Lai, Yu-Kun
Xia, Shihong
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5841 - 5850
[3] Scene Graph Masked Variational Autoencoders for 3D Scene Generation
Xu, Rui
Hui, Le
Han, Yuehui
Qian, Jianjun
Xie, Jin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5725 - 5733
[4] Privacy Protection in MRI Scans Using 3D Masked Autoencoders
Van der Goten, Lennart A.
Smith, Kevin
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 583 - 592
[5] MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph Masked Autoencoders
Yang, Zhangsihao
Ding, Kaize
Liu, Huan
Wang, Yalin
2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 3291 - 3301
[6] Generating 3D Faces Using Convolutional Mesh Autoencoders
Ranjan, Anurag
Bolkart, Timo
Sanyal, Soubhik
Black, Michael J.
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 725 - 741
[7] Variational autoencoders for 3D data processing
Szilárd Molnár
Levente Tamás
Artificial Intelligence Review, 57
[8] Variational autoencoders for 3D data processing
Molnar, Szilard
Tamas, Levente
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (02)
[9] MATE: Masked Autoencoders are Online 3D Test-Time Learners
Mirza, M. Jehanzeb
Shin, Inkyu
Lin, Wei
Schriebl, Andreas
Sun, Kunyang
Choe, Jaesung
Kozinski, Mateusz
Possegger, Horst
Kweon, In So
Yoon, Kuk-Jin
Bischof, Horst
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16663 - 16672
[10] A Novel SO(3) Rotational Equivariant Masked Autoencoder for 3D Mesh Object Analysis
Xie, Min
Zhao, Jieyu
Shen, Kedi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 329 - 342

← 1 2 3 4 5 →