Online supervised attention-based recurrent depth estimation from monocular video

被引:0
作者
Maslov D. [1 ]
Makarov I. [1 ,2 ]
机构
[1] School of Data Analysis and Artificial Intelligence, HSE University, Moscow
[2] Samsung-PDMI Joint AI Center, St. Petersburg Department of Steklov Institute of Mathematics, St. Petersburg
来源
Maslov, Dmitrii (dvmaslov@edu.hse.ru) | 1600年 / PeerJ Inc.卷 / 06期
关键词
Augmented Reality; Autonomous Vehicles; Computer Science Methods; Computer Vision; Deep Convolutional Neural Networks; Depth Reconstruction; Recurrent Neural Networks;
D O I
10.7717/PEERJ-CS.317
中图分类号
学科分类号
摘要
Autonomous driving highly depends on depth information for safe driving. Recently, major improvements have been taken towards improving both supervised and self-supervised methods for depth reconstruction. However, most of the current approaches focus on single frame depth estimation, where quality limit is hard to beat due to limitations of supervised learning of deep neural networks in general. One of the way to improve quality of existing methods is to utilize temporal information from frame sequences. In this paper, we study intelligent ways of integrating recurrent block in common supervised depth estimation pipeline. We propose a novel method, which takes advantage of the convolutional gated recurrent unit (convGRU) and convolutional long short-term memory (convLSTM). We compare use of convGRU and convLSTM blocks and determine the best model for real-time depth estimation task. We carefully study training strategy and provide new deep neural networks architectures for the task of depth estimation from monocular video using information from past frames based on attention mechanism. We demonstrate the efficiency of exploiting temporal information by comparing our best recurrent method with existing image-based and video-based solutions for monocular depth reconstruction. © 2020. Maslov and Makarov. All Rights Reserved.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 61 条
[51]  
Wang R, Pizer SM, Frahm J-M., Recurrent neural network for (un-)supervised learning of monocular videovisual odometry and depth, (2019)
[52]  
Xie J, Girshick R, Farhadi A., Deep3d: fully automatic 2d-to-3d video conversion with deep convolutional neural networks, European conference on computer vision, pp. 842-857, (2016)
[53]  
Xingjian S, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C., Convolutional LSTM network: a machine learning approach for precipitation nowcasting, Advances in neural information processing systems, pp. 802-810, (2015)
[54]  
Xu D, Ricci E, Ouyang W, Wang X, Sebe N., Multi-scale continuous crfs as sequential deep networks for monocular depth estimation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5354-5362, (2017)
[55]  
Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E., Structured attention guided convolutional neural fields for monocular depth estimation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3917-3925, (2018)
[56]  
Yang N, Wang R, Stckler J, Cremers D., Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry, (2018)
[57]  
Yin W, Liu Y, Shen C, Yan Y., Enforcing geometric constraints of virtual normal for depth prediction, Proceedings of the IEEE international conference on computer vision, pp. 5684-5693, (2019)
[58]  
Yin Z, Shi J., GeoNet: unsupervised learning of dense depth, optical flow and camera pose, CVPR, (2018)
[59]  
Zhang H, Shen C, Li Y, Cao Y, Liu Y, Yan Y., Exploiting temporal consistency for real-time video depth estimation, Proceedings of the IEEE international conference on computer vision, pp. 1725-1734, (2019)
[60]  
Zhou T, Brown M, Snavely N, Lowe DG., Unsupervised learning of depth and egomotion from video, CVPR, (2017)