Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization

被引:9
作者
Bello, Juan Luis Gonzalez [1 ]
Moon, Jaeho [1 ]
Kim, Munchurl [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Scho Elect Engn, Daejeon 34141, South Korea
关键词
Depth from videos; self-supervised; monocular depth estimation; deep convolutional neural networks;
D O I
10.1109/TIP.2024.3374045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attempts to learn the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion have drawn much attention. One of the most challenging aspects of this task is to handle independently moving objects as they break the rigid-scene assumption. In this paper, we show for the first time that pixel positional information can be exploited to learn SVDE (Single View Depth Estimation) from videos. The proposed moving object (MO) masks, which are induced by the depth variance to shifted positional information (SPI) and are referred to as 'SPIMO' masks, are highly robust and consistently remove independently moving objects from the scenes, allowing for robust and consistent learning of SVDE from videos. Additionally, we introduce a new adaptive quantization scheme that assigns the best per-pixel quantization curve for depth discretization, improving the fine granularity and accuracy of the final aggregated depth maps. Finally, we employ existing boosting techniques in a new way that self-supervises moving object depths further. With these features, our pipeline is robust against moving objects and generalizes well to high-resolution images, even when trained with small patches, yielding state-of-the-art (SOTA) results with four- to eight-fold fewer parameters than the previous SOTA techniques that learn from videos. We present extensive experiments on KITTI and CityScapes that show the effectiveness of our method.
引用
收藏
页码:2074 / 2089
页数:16
相关论文
共 59 条
[1]  
Bae J, 2023, AAAI CONF ARTIF INTE, P187
[2]   DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium [J].
Bangunharcana, Antyanta ;
Magd, Ahmed ;
Kim, Kyung-Soo .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :726-738
[3]   PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [J].
Bello, Juan Luis Gonzalez ;
Kim, Munchurl .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6847-6856
[4]   Self-Supervised Deep Monocular Depth Estimation With Ambiguity Boosting [J].
Bello, Juan Luis Gonzalez ;
Kim, Munchurl .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9131-9149
[5]   AdaBins: Depth Estimation Using Adaptive Bins [J].
Bhat, Shariq Farooq ;
Alhashim, Ibraheem ;
Wonka, Peter .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017
[6]   Self-Supervised Monocular Depth Estimation: Solving the Edge-Fattening Problem [J].
Chen, Xingyu ;
Zhang, Ruonan ;
Jiang, Ji ;
Wang, Yan ;
Li, Ge ;
Li, Thomas H. .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5765-5775
[7]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[8]  
Eigen D, 2014, ADV NEUR IN, V27
[9]   Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth [J].
Feng, Ziyue ;
Yang, Liang ;
Jing, Longlong ;
Wang, Haiyan ;
Tian, YingLi ;
Li, Bing .
COMPUTER VISION - ECCV 2022, PT XXXII, 2022, 13692 :228-244
[10]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011