What Do Single-view 3D Reconstruction Networks Learn?

被引:312
作者
Tatarchenko, Maxim [1 ]
Richter, Stephan R. [2 ]
Ranftl, Rene [2 ]
Li, Zhuwen [2 ]
Koltun, Vladlen [2 ]
Brox, Thomas [1 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Intel Labs, Santa Clara, CA USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
SHAPE;
D O I
10.1109/CVPR.2019.00352
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional networks for single-view object reconstruction have shown impressive performance and have become a popular subject of research. All existing techniques are united by the idea of having an encoder-decoder network that performs non-trivial reasoning about the 3D structure of the output space. In this work, we set up two alternative approaches that perform image classification and retrieval respectively. These simple baselines yield better results than state-of-the-art methods, both qualitatively and quantitatively. We show that encoder-decoder methods are statistically indistinguishable from these baselines, thus indicating that the current state of the art in single-view object reconstruction does not actually perform reconstruction but image classification. We identify aspects of popular experimental procedures that elicit this behavior and discuss ways to improve the current state of research.
引用
收藏
页码:3400 / 3409
页数:10
相关论文
共 58 条
[1]  
[Anonymous], 2017, BMVC
[2]  
[Anonymous], 2017, 3DV
[3]  
[Anonymous], 2016, ECCV
[4]  
[Anonymous], CVPR
[5]  
[Anonymous], 2017, CVPR
[6]  
[Anonymous], 2017, 3DV
[7]  
[Anonymous], TPAMI
[8]  
[Anonymous], 2017, ICCV
[9]  
[Anonymous], 1987, SIGGRAPH
[10]  
[Anonymous], 2016, NIPS