T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

被引:0
作者
Nie, Weizhi [1 ]
Chen, Ruidong [1 ]
Wang, Weijie [2 ]
Lepri, Bruno [3 ]
Sebe, Nicu [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300384, Peoples R China
[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, Italy
[3] Fdn Bruno Kessler, I-38122 Trento, Italy
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Solid modeling; Shape; Data models; Knowledge graphs; Legged locomotion; Natural languages; 3D model generation; causal model inference; cross-modal representation; knowledge graph; natural language;
D O I
10.1109/TPAMI.2024.3463753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
引用
收藏
页码:172 / 189
页数:18
相关论文
共 76 条
  • [1] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [2] Boudin F., 2016, P COLING 2016 26 INT, P69
  • [3] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
    Chen, Kevin
    Choy, Christopher B.
    Savva, Manolis
    Chang, Angel X.
    Funkhouser, Thomas
    Savarese, Silvio
    [J]. COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 100 - 116
  • [4] Multiresolution Deep Implicit Functions for 3D Shape Representation
    Chen, Zhang
    Zhang, Yinda
    Genova, Kyle
    Fanello, Sean
    Bouaziz, Sofien
    Hane, Christian
    Du, Ruofei
    Keskin, Cem
    Funkhouser, Thomas
    Tang, Danhang
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13067 - 13076
  • [5] Learning Implicit Fields for Generative Shape Modeling
    Chen, Zhiqin
    Zhang, Hao
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5932 - 5941
  • [6] Cheng J, 2020, PROC CVPR IEEE, P10908, DOI 10.1109/CVPR42600.2020.01092
  • [7] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
    Chibane, Julian
    Alldieck, Thiemo
    Pons-Moll, Gerard
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6968 - 6979
  • [8] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
    Choy, Christopher B.
    Xu, Danfei
    Gwak, Jun Young
    Chen, Kevin
    Savarese, Silvio
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 628 - 644
  • [9] Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence
    Deng, Yu
    Yang, Jiaolong
    Tong, Xin
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10281 - 10291
  • [10] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805