Unsupervised Learning of Depth and Pose Estimation based on Continuous Frame Window

被引:0
作者
Shang, Suning [1 ]
Wang, Huaimin [1 ]
Zhang, Pengfei [1 ]
Ding, Bo [1 ]
机构
[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
来源
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年
基金
中国国家自然科学基金;
关键词
Depth estimation; Pose estimation; Unsupervised learning; Image reconstruction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from video sequences. In common with recent work, we use an unsupervised end-to-end learning method, requiring monocular video sequences for training. What makes the difference is, our approach not only uses image reconstruction as the supervisory signal but also exploits the pose estimation method which was used in traditional SLAM approach to enhance the supervisory signal and add training constraints. In pose estimation, a continuous frame window is set to construct the pose graph. Our method uses single-view depth and multi-view pose networks, with a loss based on reconstructing nearby images to the target using the predicted depth and pose. During training, the networks are thus coupled by the loss but can be applied independently at test time. Our evaluation of experiments on the KITTI dataset proves the effectiveness of our method: 1) monocular depth performs superior to the supervised methods that use ground-truth depth data for training and the existing unsupervised learning method. Our method performs comparably with the supervised methods that use ground-truth pose data for training. 2) pose estimation performs almost the same compared to established SLAM systems under comparable input settings.
引用
收藏
页数:8
相关论文
共 24 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2017, CVPR
[3]  
[Anonymous], 2006, Advances in Neural Information Processing Systems, DOI [DOI 10.1109/TPAMI.2015.2505283A, 10.1109/TPAMI.2015.2505283]
[4]  
[Anonymous], 2017, CVPR
[5]  
[Anonymous], 2015, arXiv
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[8]  
Eigen D, 2014, ADV NEUR IN, V27
[9]  
Garg Vijay Kumar Ravi, EUR C COMP VIS, P740
[10]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074