T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

被引：0

作者：

Nie, Weizhi ^{[1
]}

Chen, Ruidong ^{[1
]}

Wang, Weijie ^{[2
]}

Lepri, Bruno ^{[3
]}

Sebe, Nicu ^{[2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300384, Peoples R China

[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, Italy

[3] Fdn Bruno Kessler, I-38122 Trento, Italy

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Solid modeling; Shape; Data models; Knowledge graphs; Legged locomotion; Natural languages; 3D model generation; causal model inference; cross-modal representation; knowledge graph; natural language;

D O I：

10.1109/TPAMI.2024.3463753

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.

引用

页码：172 / 189

页数：18

共 76 条

[1] Arjovsky M, 2017, PR MACH LEARN RES, V70
[2] Boudin F., 2016, P COLING 2016 26 INT, P69
[3] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Chen, Kevin
Choy, Christopher B.
Savva, Manolis
Chang, Angel X.
Funkhouser, Thomas
Savarese, Silvio
[J]. COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 100 - 116
[4] Multiresolution Deep Implicit Functions for 3D Shape Representation
Chen, Zhang
Zhang, Yinda
Genova, Kyle
Fanello, Sean
Bouaziz, Sofien
Hane, Christian
Du, Ruofei
Keskin, Cem
Funkhouser, Thomas
Tang, Danhang
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13067 - 13076
[5] Learning Implicit Fields for Generative Shape Modeling
Chen, Zhiqin
Zhang, Hao
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5932 - 5941
[6] Cheng J, 2020, PROC CVPR IEEE, P10908, DOI 10.1109/CVPR42600.2020.01092
[7] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
Chibane, Julian
Alldieck, Thiemo
Pons-Moll, Gerard
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6968 - 6979
[8] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Choy, Christopher B.
Xu, Danfei
Gwak, Jun Young
Chen, Kevin
Savarese, Silvio
[J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 628 - 644
[9] Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence
Deng, Yu
Yang, Jiaolong
Tong, Xin
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10281 - 10291
[10] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

← 1 2 3 4 5 6 7 8 →