MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

被引:65
作者
Wimbauer, Felix [1 ]
Yang, Nan [1 ,2 ]
von Stumberg, Lukas [1 ]
Zeller, Niclas [1 ,2 ]
Cremers, Daniel [1 ,2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] Artisense, Palo Alto, CA USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00605
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose MonoRec, a semi-supervised monocular dense reconstruction architecture that predicts depth maps from a single moving camera in dynamic environments. MonoRec is based on a multi-view stereo setting which encodes the information of multiple consecutive images in a cost volume. lb deal with dynamic objects in the scene, we introduce a MaskModule that predicts moving object masks by leveraging the photometric inconsistencies encoded in the cost volumes. Unlike other multi-view stereo methods, MonoRec is able to reconstruct both static and moving objects by leveraging the predicted masks. Furthermore, we present a novel multi-stage training scheme with a semi-supervised loss formulation that does not require LiDAR depth values. We carefully evaluate MonoRec on the KITTI dataset and show that it achieves state-of-the-art performance compared to both multi-view and single-view methods. With the model trained on KITTI, we furthermore demonstrate that MonoRec is able to generalize well to both the Oxford RobotCar dataset and the more challenging TUM-Mono dataset recorded by a handheld camera.
引用
收藏
页码:6108 / 6118
页数:11
相关论文
共 67 条
[1]  
[Anonymous], 2010, DAGM C PATT REC
[2]   CodeSLAM-Learning a Compact, Optimisable Representation for Dense Visual SLAM [J].
Bloesch, Michael ;
Czarnowski, Jan ;
Clark, Ronald ;
Leutenegger, Stefan ;
Davison, Andrew J. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2560-2568
[3]  
Campbell NDF, 2008, LECT NOTES COMPUT SC, V5302, P766, DOI 10.1007/978-3-540-88682-2_58
[4]  
Chen Rui, 2019, ICCV
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   DeepFactors: Real-Time Probabilistic Dense Monocular SLAM [J].
Czarnowski, Jan ;
Laidlow, Tristan ;
Clark, Ronald ;
Davison, Andrew J. .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :721-728
[7]  
Eigen D, 2014, ADV NEUR IN, V27
[8]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[9]  
Engel J., 2016, ARXIV160702555
[10]   Direct Sparse Odometry [J].
Engel, Jakob ;
Koltun, Vladlen ;
Cremers, Daniel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (03) :611-625