Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

被引：150

作者：

Xie, Haozhe ^{[1
,2
,4
]}

Yao, Hongxun ^{[1
,2
]}

Zhang, Shengping ^{[3
,7
]}

Zhou, Shangchen ^{[6
]}

Sun, Wenxiu ^{[5
]}

机构：

[1] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Weihai, Peoples R China

[4] SenseTime Res, Shenzhen, Peoples R China

[5] SenseTime Res, Hong Kong, Peoples R China

[6] Nanyang Technol Univ, Singapore, Singapore

[7] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2020年 / 128卷 / 12期

基金：

中国国家自然科学基金;

关键词：

3D object reconstruction; Multi-scale; Context-aware; Convolutional neural network; SHAPE;

D O I：

10.1007/s11263-020-01347-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention in the past few years. Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to sequentially fuse feature maps of input images. However, RNN-based approaches are unable to produce consistent reconstruction results when given the same input images with different orders. Moreover, RNNs may forget important features from early input images due to long-term memory loss. To address these issues, we propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume. To further correct the wrongly recovered parts in the fused 3D volume, a refiner is adopted to generate the final output. Experimental results on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs favorably against state-of-the-art methods in terms of both accuracy and efficiency.

引用

页码：2919 / 2935

页数：17

共 56 条

[1]

[Anonymous], 2019, ADV NEURAL INFORM PR

[2] Shape, Illumination, and Reflectance from Shading [J].

Barron, Jonathan T. ;

Malik, Jitendra .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (08) :1670-1687

[3] Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age [J].

Cadena, Cesar ;

Carlone, Luca ;

Carrillo, Henry ;

Latif, Yasir ;

Scaramuzza, Davide ;

Neira, Jose ;

Reid, Ian ;

Leonard, John J. .

IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (06) :1309-1332

[4]

Chang Angel X., 2015, arXiv

[5] Learning Implicit Fields for Generative Shape Modeling [J].

Chen, Zhiqin ;

Zhang, Hao .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5932-5941

[6] 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction [J].

Choy, Christopher B. ;

Xu, Danfei ;

Gwak, Jun Young ;

Chen, Kevin ;

Savarese, Silvio .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :628-644

[7] Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks [J].

Dibra, Endri ;

Jain, Himanshu ;

Oztireli, Cengiz ;

Ziegler, Remo ;

Gross, Markus .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5504-5514

[8] A Point Set Generation Network for 3D Object Reconstruction from a Single Image [J].

Fan, Haoqiang ;

Su, Hao ;

Guibas, Leonidas .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2463-2471

[9] Visual simultaneous localization and mapping: a survey [J].

Fuentes-Pacheco, Jorge ;

Ruiz-Ascencio, Jose ;

Manuel Rendon-Mancha, Juan .

ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (01) :55-81

[10]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

← 1 2 3 4 5 6 →