Bags of tricks for learning depth and camera motion from monocular videos

被引:0
作者
Dong B. [1 ]
Sheng L. [2 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Harbin
[2] College of Software, Beihang University, Beijing
来源
Virtual Reality and Intelligent Hardware | 2019年 / 1卷 / 05期
关键词
Monocular visual odometry; Unsupervised learning;
D O I
10.1016/j.vrih.2019.09.004
中图分类号
学科分类号
摘要
Background: Based on the seminal work proposed by Zhou et al., much of the recent progress in learning monocular visual odometry, i. e. depth and camera motion from monocular videos, can be attributed to the tricks in the training procedure, such as data augmentation and learning objectives. Methods: Herein, we categorize a collection of such tricks through the theoretical examination and empirical evaluation of their effects on the final accuracy of the visual odometry. Results/Conclusions: By combining the aforementioned tricks, we were able to significantly improve a baseline model adapted from SfMLearner without additional inference costs. Furthermore, we analyzed the principles of these tricks and the reason for their success. Practical guidelines for future research are also presented. © 2019 Beijing Zhongke Journal Publishing Co. Ltd
引用
收藏
页码:500 / 510
页数:10
相关论文
共 21 条
[1]  
Zhou T.H., Brown M., Snavely N., Lowe D.G., Unsupervised learning of depth and ego-motion from video, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017)
[2]  
Wang C.Y., Buenaposada J.M., Zhu R., Lucey S., Learning depth from monocular videos using direct methods, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018)
[3]  
Tang C., Tan P., BA-Net: Dense Bundle Adjustment Networks, International Conference on Learning Representation, (2019)
[4]  
Geiger A., Lenz P., Stiller C., Urtasun R., Vision meets robotics: The KITTI dataset, The International Journal of Robotics Research, 32, 11, pp. 1231-1237, (2013)
[5]  
Eigen D., Puhrsch C., Fergus R., Depth Map Prediction from a Single Image using a Multi-scale Deep Network, Advances in Neural Information Processing Systems, pp. 2366-2374, (2014)
[6]  
Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing, 13, 4, pp. 600-612, (2004)
[7]  
Simonyan K., Zisserman A., Very Deep Convolutional Networks for Large-scale Image Recognition, (2014)
[8]  
Johnson J., Alahi A., Li F.F., Perceptual losses for real-time style transfer and super-resolution//Computer Vision–ECCV 2016, pp. 694-711, (2016)
[9]  
Casser V., Pirk S., Mahjourian R., Angelova A., Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos, Proceedings of the AAAI Conference on Artificial Intelligence, 33, pp. 8001-8008, (2019)
[10]  
Godard C., Aodha O.M., Brostow G.J., Unsupervised monocular depth estimation with left-right consistency, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017)