MeshCLIP: Efficient cross-modal information processing for 3D mesh data in zero/few-shot learning

被引：6

作者：

Song, Yupeng ^{[1
,2
]}

Liang, Naifu ^{[1
]}

Guo, Qing ^{[1
]}

Dai, Jicheng ^{[1
]}

Bai, Junwei ^{[1
,2
]}

He, Fazhi ^{[1
,2
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China

[2] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 06期

基金：

中国国家自然科学基金;

关键词：

3D mesh processing; Cross-modal learning; Zero-shot learning; Few-shot learning; MODEL;

D O I：

10.1016/j.ipm.2023.103497

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text, 2D, and 3D information are crucial information representations in modern science and management disciplines. However, complex and irregular 3D data produce data scarcity and expensive generation that limit their processing and application. In this paper, we present MeshCLIP, a new cross-modal information learning paradigm to directly process 3D mesh data end-to-end in a zero/few-shot manner. Specifically, we design a novel pipeline based on visual factors and graphics principles, bridging the gap between 3D mesh data and other modal data, thereby joining 2/3D visual and textual information for zero/few-shot learning. Then, we construct a self-attention adapter for 3D mesh key information learning when training only a few priors, significantly improving the model's discriminative ability. Extensive experiments demonstrate that the proposed MeshCLIP can achieve state-of-the-art results on multiple challenging 3D mesh datasets. In the whole 3D domain, the proposed zero-shot approach significantly outperforms the existing other 3D representation methods with an accuracy 3 x better ( increased by 41.5%) on the ModelNet40 dataset. Furthermore, in few-shot learning, the proposed MeshCLIP uses only a few supervised priors (only less than 10% of the sample size) to achieve results close to those of methods trained on a full dataset.

引用

页数：17

共 82 条

[1] Alayrac JB, 2022, ADV NEUR IN
[2] [Anonymous], 2011, PROC EUROGRAPHICS 20, P79, DOI DOI 10.2312/3DOR/3DOR11/079-088
[3] AttWalk: Attentive Cross-Walks for Deep Mesh Analysis
Ben Izhak, Ran
Lahav, Alon
Tal, Ayellet
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2937 - 2946
[4] Search task success evaluation by exploiting multi-view active semi-supervised learning
Chen, Ling
Fan, Alin
Shi, Hongyu
Chen, Gencai
[J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
[5] Cheng TY, 2022, AAAI CONF ARTIF INTE, P427
[6] A Class-Imbalanced Heterogeneous Federated Learning Model for Detecting Icing on Wind Turbine Blades
Cheng, Xu
Shi, Fan
Liu, Yongping
Zhou, Jiehan
Liu, Xiufeng
Huang, Lizhen
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (12) : 8487 - 8497
[7] Zero-Shot Learning on 3D Point Cloud Objects and Beyond
Cheraghian, Ali
Rahman, Shafin
Chowdhury, Townim F.
Campbell, Dylan
Petersson, Lars
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (10) : 2364 - 2384
[8] Cheraghian A, 2019, PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), DOI [10.23919/MVA.2019.8758063, 10.23919/mva.2019.8758063]
[9] Dong Q, 2023, IEEE Transactions on Visualization and Computer Graphics
[10] Dosovitskiy A., 2021, INT C LEARN REPRESEN

← 1 2 3 4 5 6 7 8 9 →