Don't Forget The Past: Recurrent Depth Estimation from Monocular Video

被引:91
作者
Patil, Vaishakh [1 ]
Van Gansbeke, Wouter [2 ]
Dai, Dengxin [1 ]
Van Gool, Luc [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, TRACE Zurich, Comp Vis Lab, CH-8092 Zurich, Switzerland
[2] Katholieke Univ Leuven, Toyota TRACE Leuven, Dept Elect Engn ESAT, B-3001 Leuven, Belgium
关键词
Deep learning for visual perception; RGBD perception; sensor fusion; novel deep learning methods; autonomous vehicle navigation; PREDICTION;
D O I
10.1109/LRA.2020.3017478
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if themethod starts fromvideo input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresponding networks with a ConvLSTM such that the spatiotemporal structures of depth across frames can be exploited to yield a more accurate depth estimation. Our method is flexible. It can be applied to monocular videos only or be combined with different types of sparse depth patterns. We carefully study the architecture of the recurrent network and its training strategy. We are first to successfully exploit recurrent networks for real-time self-supervised monocular depth estimation and completion. Extensive experiments show that our recurrent method outperforms its image-based counterpart consistently and significantly in both self-supervised scenarios. It also outperforms previous depth estimation methods of the three popular groups. Please refer to our webpage(1) for details.
引用
收藏
页码:6813 / 6820
页数:8
相关论文
共 46 条
[1]   Learning to See by Moving [J].
Agrawal, Pulkit ;
Carreira, Joao ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45
[2]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[3]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071
[4]   Estimating Depth from RGB and Sparse Sensing [J].
Chen, Zhao ;
Badrinarayanan, Vijay ;
Drozdov, Gilad ;
Rabinovich, Andrew .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :176-192
[5]   Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network [J].
Cheng, Xinjing ;
Wang, Peng ;
Yang, Ruigang .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :108-125
[6]  
Clevert Djork-Arne, 2015, FAST ACCURATE DEEP N
[7]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[8]  
Eigen David, 2014, ADV NEURAL INFORM PR, P2366, DOI DOI 10.5555/2969033.2969091
[9]  
Eldesokey A., 2019, PAMI, V1, P2
[10]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011