Self-supervised Object Motion and Depth Estimation from Video

被引:21
作者
Dai, Qi [1 ,3 ]
Patii, Vaishakh [1 ]
Hecker, Simon [1 ]
Dai, Dengxin [1 ]
Van Gool, Luc [1 ,2 ]
Schindler, Konrad [1 ,3 ]
机构
[1] Swiss Fed Inst Technol, Comp Vis Lab, Zurich, Switzerland
[2] Katholieke Univ Leuven, ESAT PSI, VISICS, Leuven, Belgium
[3] Swiss Fed Inst Technol, Inst Geodesy & Photogrammetry, Zurich, Switzerland
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年
关键词
D O I
10.1109/CVPRW50498.2020.00510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a self-supervised learning framework to estimate the individual object motion and monocular depth from video. We model the object motion as a 6 degree-of-freedom rigid-body transformation. The instance segmentation mask is leveraged to introduce the information of object. Compared with methods which predict dense optical flow map to model the motion, our approach significantly reduces the number of values to be estimated. Our system eliminates the scale ambiguity of motion prediction through imposing a novel geometric constraint loss term. Experiments on KITTI driving dataset demonstrate our system is capable to capture the object motion without external annotation. Our system outperforms previous self-supervised approaches in terms of 3D scene flow prediction, and contribute to the disparity prediction in dynamic area.
引用
收藏
页码:4326 / 4334
页数:9
相关论文
共 35 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Battrawy R, 2019, IEEE INT C INT ROBOT, P7762, DOI [10.1109/IROS40897.2019.8967739, 10.1109/iros40897.2019.8967739]
[3]  
Cao Zhe, 2019, CVPR
[4]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[7]  
Eigen D., 2014, Advances in neural information processing systems, P2366, DOI [DOI 10.5555/2969033.2969091, DOI 10.1007/978-3-540-28650-9_5]
[8]  
Geiger A., 2011, INTELLIGENT VEHICLES
[9]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[10]   Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Brostow, Gabriel J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611