Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement

被引：0

作者：

Qu, Yi ^{[1
]}

Chen, Ying ^{[1
]}

机构：

[1] Key Laboratory of Advanced Process Control for Light Industry, Ministry of Education, Jiangnan University, Jiangsu, Wuxi

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2024年 / 52卷 / 09期

基金：

中国国家自然科学基金;

关键词：

channel attention; deep learning; monocular depth estimation; multi-scale; unsupervised learning;

D O I：

10.12263/DZXB.20230767

中图分类号：

学科分类号：

摘要：

Due to the relationship of one-to-many between images and depth maps in monocular depth estimation, there is a problem of scale ambiguity in monocular depth estimation itself. In order to improve the inherent ambiguity problem in geometric modeling of monocular depth estimation, this paper introduces a monocular multi-frame depth estimation method based on multi-view stereo (MVS) to construct moving depth and dig the scale clues. The traditional monocular depth estimation and MVS depth estimation are organically combined to improve the inherent ambiguity problem in the geometric modeling of monocular depth estimation. On this basis, two channel attention modules are designed to improve the network's ability to perceive scene structures and process local information, so as to more fully integrate features of different scales and produce more accurate and clearer depth maps.In the test results of the KITTI dataset, the average relative error and square relative error of this paper have been improved by 4.7% and 8.0% respectively compared to the baseline network, with all error and accuracy indicators surpassing other mainstream unsupervised monocular depth estimation methods. © 2024 Chinese Institute of Electronics. All rights reserved.

引用

页码：3217 / 3227

页数：10

共 40 条

[1]

SU T K, SONG H H, FAN J Q, Et al., Learning depth signal guided mixed transformer for high-performance unsupervised video object segmentation, Acta Electronica Sinica, 51, 5, pp. 1388-1395, (2023)

[2]

KIRAN B R, SOBH I, TALPAERT V, Et al., Deep reinforcement learning for autonomous driving: A survey, IEEE Transactions on Intelligent Transportation Systems, 23, 6, pp. 4909-4926, (2022)

[3]

ZHOU T H, BROWN M, SNAVELY N, Et al., Unsupervised learning of depth and ego-motion from video, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612-6619, (2017)

[4]

GODARD C, AODHA O MAC, FIRMAN M, Et al., Digging into self-supervised monocular depth estimation, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827-3837, (2019)

[5]

SPENCER J, BOWDEN R, HADFIELD S., DeFeat-net: General monocular depth via simultaneous unsupervised representation learning, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14390-14401, (2020)

[6]

JOHNSTON A, CARNEIRO G., Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4756-4765, (2020)

[7]

YE X Y, HE Y L, RU S N., Unsupervised monocular depth estimation and visual odometry based on generative adversarial network and self-attention mechanism, Robot, 43, 2, pp. 203-213, (2021)

[8]

ZHANG H K, SHEN C H, LIi Y, Et al., Exploiting temporal consistency for real-time video depth estimation, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1725-1734, (2019)

[9]

WANG R, PIZER S M, FRAHM J M., Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5555-5564, (2019)

[10]

WIMBAUER F, YANG N, VON STUMBERG L, Et al., MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6112-6122, (2021)

← 1 2 3 4 →