CPG3D: Cross-Modal Priors Guided 3D Object Reconstruction

被引:10
作者
Nie, Weizhi [1 ]
Jiao, Chuanqi [1 ]
Chang, Rihao [1 ,2 ]
Qu, Lei [3 ]
Liu, An-An [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Ocean Univ China, Sch Informat Sci & Engn, Qingdao 266100, Shandong, Peoples R China
[3] Hisense Grp Holdings Co Ltd, Qingdao 266000, Peoples R China
关键词
Three-dimensional displays; Shape; Image reconstruction; Feature extraction; Solid modeling; Task analysis; Data mining; 3D model reconstruction; Multimodal learning; Cross-modal retrieval; SHAPE; NETWORK;
D O I
10.1109/TMM.2023.3251697
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Three-dimensional reconstruction is a multimedia technology widely used in computer-aided modeling and 3D animation. Nevertheless, it is still hard for reconstruction methods to overcome the 3D geometry missing and the object occlusion in the single-view images. In this article, we propose a novel method (CPG3D) for reconstructing high-quality 3D shapes from a single image under the guidance of prior knowledge. Using the single-view image as the query, prior knowledge is collected from public 3D datasets, which can compensate for missing 3D geometries and assist the 3D reconstruction network to high fidelity results. Our method consists of three parts: 1) Cross-modal 3D shape retrieval module: This part retrieves related 3D shapes based on 2D images. Here, we apply the pre-trained model to guarantee the correlation between the retrieved 3D shape and the input image. 2) Multimodal information fusion module: We propose a multimodal attention mechanism to handle the information fusing of 2D visual and 3D structural information; 3) Three-dimensional reconstruction module: We propose a novel encoder-decoder network for 3D shape reconstruction. Specifically, we employ the skip connection operation to link the target image's visual information with the 3D model's structural information to enhance the prediction of 3D details. During training, we employ two carefully designed loss functions to lead the multimodal learning to obtain proper modal features. On the ShapeNet and Pix3D datasets, the final experimental results reveal that our method notably increases reconstruction quality and outperforms SOTA methods.
引用
收藏
页码:9383 / 9396
页数:14
相关论文
共 74 条
[1]  
Aliomonos J., 1985, P 9 INT JOINT C ART, P926
[2]   Statistical approach to shape from shading: Reconstruction of three-dimensional face surfaces from single two-dimensional images [J].
Atick, JJ ;
Griffin, PA ;
Redlich, AN .
NEURAL COMPUTATION, 1996, 8 (06) :1321-1340
[3]   Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models [J].
Aubry, Mathieu ;
Maturana, Daniel ;
Efros, Alexei A. ;
Russell, Bryan C. ;
Sivic, Josef .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3762-3769
[4]  
Ba J., 2015, PROC INT C LEARN REP
[5]   Hierarchical Surface Prediction for 3D Object Reconstruction [J].
Bane, Christian ;
Tulsiani, Shubham ;
Malik, Jitendra .
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, :412-420
[6]   Learning Implicit Fields for Generative Shape Modeling [J].
Chen, Zhiqin ;
Zhang, Hao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5932-5941
[7]   3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction [J].
Choy, Christopher B. ;
Xu, Danfei ;
Gwak, Jun Young ;
Chen, Kevin ;
Savarese, Silvio .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :628-644
[8]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[9]  
Dovgard R, 2004, LECT NOTES COMPUT SC, V3022, P99
[10]   A Point Set Generation Network for 3D Object Reconstruction from a Single Image [J].
Fan, Haoqiang ;
Su, Hao ;
Guibas, Leonidas .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2463-2471