Self-supervised monocular depth estimation in dynamic scenes based on deep learning

被引:0
作者
Cheng, Binbin [1 ]
Yu, Ying [1 ]
Zhang, Lei [1 ]
Wang, Ziquan [1 ]
Jiang, Zhipeng [1 ]
机构
[1] Information Engineering University, Institute of Geospatial Information, Zhengzhou
基金
中国国家自然科学基金;
关键词
3D reconstruction; deep learning; dynamic scenes; monocular depth estimation; remote sensing; self-supervised learning;
D O I
10.11834/jrs.20233060
中图分类号
学科分类号
摘要
In the real world, completely static scenes do not exist. Monocular depth estimation in dynamic scenes refers to obtaining depth information of dynamic foreground and static background from a single image, which has advantages over traditional stereo estimation methods in terms of flexibility and cost-effectiveness. It has strong research relevance and broad development prospects, playing a key role in downstream tasks, such as 3D reconstruction and autonomous driving. With the rapid development of deep learning technology self-supervised learning without using real data labels has attracted the enthusiasm of many scholars. Many local and foreign scholars have proposed a series of self-supervised monocular depth estimation algorithms to deal with dynamic objects in scenes, laying the research foundation for researchers in related fields. However, a comprehensive analysis of the above methods has yet to be conducted. To address this issue, this study systematically reviews and summarizes the progress of self-supervised monocular depth estimation in dynamic scenes based on deep learning. First, the basic models of self-supervised monocular depth estimation based on deep learning are summarized, and how self-supervised constraints are applied between images is analyzed and explained. Moreover, a basic framework diagram of self-supervised monocular depth estimation based on continuous frames is drawn. The effect of dynamic objects on images is explained from four aspects: epipolar lines, triangulation, fundamental matrix estimation, and reprojection error. Second, commonly used datasets and evaluation metrics for monocular depth estimation research are introduced. The KITTI and Cityscapes datasets provide continuous outdoor image data, while the NYU Depth V2 dataset provides indoor dynamic scene data, which are generally used for model training. The Make3D dataset has depth data but discontinuous images, which are generally used to test the generalization ability of the model. The algorithms are quantitatively analyzed using Root Mean Square Error (RMSE), logarithmic root mean square error (RMSE log), absolute relative error (Abs Rel), squared relative error (Sq Rel), and accuracies (Acc), and the performance of classic monocular depth estimation models in dynamic scenes is compared and analyzed.Then, on the basis of different ways of handling dynamic objects, the research directions of robust depth estimation in dynamic scenes and dynamic object tracking and depth estimation are summarized and analyzed. Dynamic objects are extracted and treated as outliers during training model to minimize their effect, training solely on static background information, which is referred to as robust depth estimation in dynamic scenes. Accurately distinguishing dynamic foreground and static background and processing the two regions separately is referred to as dynamic object tracking and depth estimation. Various algorithms for detecting and segmenting dynamic objects based on optical flow information, semantic information, and other information while estimating their motion are explained. At the same time, the advantages and disadvantages of each type of algorithm are summarized and analyzed on the basis of commonly used evaluation criteria. Finally, the future development directions of monocular depth estimation in dynamic scenes are discussed from the aspects of network model optimization, online learning and generalization, real-time operation capability of embedded devices, and domain adaptation of self-supervised learning. © 2024 Science Press. All rights reserved.
引用
收藏
页码:2170 / 2186
页数:16
相关论文
共 50 条
  • [31] Dense Depth Estimation in Monocular Endoscopy With Self-Supervised Learning Methods
    Liu, Xingtong
    Sinha, Ayushi
    Ishii, Masaru
    Hager, Gregory D.
    Reiter, Austin
    Taylor, Russell H.
    Unberath, Mathias
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (05) : 1438 - 1447
  • [32] ATTENTION-BASED SELF-SUPERVISED LEARNING MONOCULAR DEPTH ESTIMATION WITH EDGE REFINEMENT
    Jiang, Chenweinan
    Liu, Haichun
    Li, Lanzhen
    Pan, Changchun
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3218 - 3222
  • [33] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)
  • [34] Self-supervised monocular depth estimation with large kernel attention and dynamic scene perception
    Xiang, Xuezhi
    Wang, Yao
    Li, Xiaoheng
    Zhang, Lei
    Zhen, Xiantong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 108
  • [35] Self-supervised Monocular Depth Estimation on Unseen Synthetic Cameras
    Diana-Albelda, Cecilia
    Bravo Perez-Villar, Juan Ignacio
    Montalvo, Javier
    Garcia-Martin, Alvaro
    Bescos Cano, Jesus
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 449 - 463
  • [36] GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network
    Masoumian, Armin
    Rashwan, Hatem A.
    Abdulwahab, Saddam
    Cristiano, Julian
    Asif, M. Salman
    Puig, Domenec
    NEUROCOMPUTING, 2023, 517 : 81 - 92
  • [37] Transferring knowledge from monocular completion for self-supervised monocular depth estimation
    Sun, Lin
    Li, Yi
    Liu, Bingzheng
    Xu, Liying
    Zhang, Zhe
    Zhu, Jie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 42485 - 42495
  • [38] Self-supervised monocular depth estimation based on combining convolution and multilayer perceptron
    Zheng, Qiumei
    Yu, Tao
    Wang, Fenghua
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [39] SfMDiffusion: self-supervised monocular depth estimation in endoscopy based on diffusion models
    Yu Li
    Da Chang
    Die Luo
    Jin Huang
    Lan Dong
    Du Wang
    Liye Mei
    Cheng Lei
    International Journal of Computer Assisted Radiology and Surgery, 2025, 20 (5) : 971 - 979
  • [40] Transferring knowledge from monocular completion for self-supervised monocular depth estimation
    Lin Sun
    Yi Li
    Bingzheng Liu
    Liying Xu
    Zhe Zhang
    Jie Zhu
    Multimedia Tools and Applications, 2022, 81 : 42485 - 42495