Generic 3D Representation via Pose Estimation and Matching

被引:34
作者
Zamir, Amir R. [1 ]
Wekel, Tilman [1 ]
Agrawal, Pulkit [2 ]
Wei, Colin [1 ]
Malik, Jitendra [2 ]
Savarese, Silvio [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
COMPUTER VISION - ECCV 2016, PT III | 2016年 / 9907卷
关键词
Generic vision; Representation; Descriptor learning; Pose estimation; Wide-baseline matching; Street view;
D O I
10.1007/978-3-319-46487-9_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learnt features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 65 条
[1]   Building Rome in a Day [J].
Agarwal, Sameer ;
Furukawa, Yasutaka ;
Snavely, Noah ;
Simon, Ian ;
Curless, Brian ;
Seitz, Steven M. ;
Szeliski, Richard .
COMMUNICATIONS OF THE ACM, 2011, 54 (10) :105-112
[2]  
Agrawal Pulkit., 2015, Learning to see by moving
[3]  
Alahi A, 2012, PROC CVPR IEEE, P510, DOI 10.1109/CVPR.2012.6247715
[4]  
[Anonymous], P 6 ACM MULT SYST C
[5]  
[Anonymous], 2014, CoRR
[6]  
[Anonymous], 2015, P INT C COMP VIS ICC
[7]  
[Anonymous], 2011, VisualSFM: A visual structure from motion system
[8]  
[Anonymous], 2016, arXiv
[9]  
[Anonymous], 2015, abs/1506.03365
[10]  
[Anonymous], 2013, 31 INT C MACH LEARN