MeshCLIP: Efficient cross-modal information processing for 3D mesh data in zero/few-shot learning

被引:6
作者
Song, Yupeng [1 ,2 ]
Liang, Naifu [1 ]
Guo, Qing [1 ]
Dai, Jicheng [1 ]
Bai, Junwei [1 ,2 ]
He, Fazhi [1 ,2 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
3D mesh processing; Cross-modal learning; Zero-shot learning; Few-shot learning; MODEL;
D O I
10.1016/j.ipm.2023.103497
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text, 2D, and 3D information are crucial information representations in modern science and management disciplines. However, complex and irregular 3D data produce data scarcity and expensive generation that limit their processing and application. In this paper, we present MeshCLIP, a new cross-modal information learning paradigm to directly process 3D mesh data end-to-end in a zero/few-shot manner. Specifically, we design a novel pipeline based on visual factors and graphics principles, bridging the gap between 3D mesh data and other modal data, thereby joining 2/3D visual and textual information for zero/few-shot learning. Then, we construct a self-attention adapter for 3D mesh key information learning when training only a few priors, significantly improving the model's discriminative ability. Extensive experiments demonstrate that the proposed MeshCLIP can achieve state-of-the-art results on multiple challenging 3D mesh datasets. In the whole 3D domain, the proposed zero-shot approach significantly outperforms the existing other 3D representation methods with an accuracy 3 x better ( increased by 41.5%) on the ModelNet40 dataset. Furthermore, in few-shot learning, the proposed MeshCLIP uses only a few supervised priors (only less than 10% of the sample size) to achieve results close to those of methods trained on a full dataset.
引用
收藏
页数:17
相关论文
共 82 条
  • [1] Alayrac JB, 2022, ADV NEUR IN
  • [2] [Anonymous], 2011, PROC EUROGRAPHICS 20, P79, DOI DOI 10.2312/3DOR/3DOR11/079-088
  • [3] AttWalk: Attentive Cross-Walks for Deep Mesh Analysis
    Ben Izhak, Ran
    Lahav, Alon
    Tal, Ayellet
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2937 - 2946
  • [4] Search task success evaluation by exploiting multi-view active semi-supervised learning
    Chen, Ling
    Fan, Alin
    Shi, Hongyu
    Chen, Gencai
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [5] Cheng TY, 2022, AAAI CONF ARTIF INTE, P427
  • [6] A Class-Imbalanced Heterogeneous Federated Learning Model for Detecting Icing on Wind Turbine Blades
    Cheng, Xu
    Shi, Fan
    Liu, Yongping
    Zhou, Jiehan
    Liu, Xiufeng
    Huang, Lizhen
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (12) : 8487 - 8497
  • [7] Zero-Shot Learning on 3D Point Cloud Objects and Beyond
    Cheraghian, Ali
    Rahman, Shafin
    Chowdhury, Townim F.
    Campbell, Dylan
    Petersson, Lars
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (10) : 2364 - 2384
  • [8] Cheraghian A, 2019, PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), DOI [10.23919/MVA.2019.8758063, 10.23919/mva.2019.8758063]
  • [9] Dong Q, 2023, IEEE Transactions on Visualization and Computer Graphics
  • [10] Dosovitskiy A., 2021, INT C LEARN REPRESEN