Self-supervised monocular Depth estimation with multi-scale structure similarity loss

被引：5

作者：

Han, Chenggong ^{[1
]}

Cheng, Deqiang ^{[1
]}

Kou, Qiqi ^{[2
]}

Wang, Xiaoyi ^{[1
]}

Chen, Liangliang ^{[1
]}

Zhao, Jiamin ^{[1
]}

机构：

[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China

[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2022年 / 82卷 / 24期

基金：

中国国家自然科学基金;

关键词：

Self-supervised learning; Monocular depth estimation; Structural similarity; Attentional mechanism;

D O I：

10.1007/s11042-022-14012-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The raw depth image captured by the depth sensor usually has an extensive range of missing depth values, and the incomplete depth map burdens many downstream vision tasks. In order to overcome the incorrect estimation issue of depth information with the original luminosity loss function for processing complex texture areas and distant moving objects, this paper proposes a self-supervised monocular depth estimation algorithm based on multi-scale structure similarity loss. So as to enhance the perception ability of the depth prediction network for pixel edges, this paper proposes a multi-scale structural similarity when calculating the loss. In addition, an attention mechanism is also added to the encoder stage of the deep prediction network. As a result, the network not only ignores the features with small contributions, but also strengthens the features assist judgment based on the adjustment of the feature map. Finally, the experiments on the KITTI dataset and Cityscapes are conducted, and then the results are compared and analyzed with the state-of-the-art algorithms. The experimental results demonstrate that the proposed algorithm achieves significant improvements in accuracy, especially on the KITTI dataset, whose precision is raised to 88.4%. Moreover, under the premise of outstanding accuracy, the visualization effect of depth estimation has also been significantly improved, especially in the scenes with multi-person overlap on Cityscapes.

引用

页码：38035 / 38050

页数：16

共 39 条

[1]

Ahmed Sohail, 2016, 2016 13th International Conference on Service Systems and Service Management (ICSSSM), P1, DOI 10.1109/ICSSSM.2016.7538459

[2] A lightweight network for monocular depth estimation with decoupled body and edge supervision [J].

Ali, Usman ;

Bayramli, Bayram ;

Alsarhan, Tamam ;

Lu, Hongtao .

IMAGE AND VISION COMPUTING, 2021, 113

[3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[4]

Bian JW, 2019, ADV NEUR IN, V32

[5]

Casser V, 2019, AAAI CONF ARTIF INTE, P8001

[6] Content-guided deep residual network for single image super-resolution [J].

Chen, Liangliang ;

Kou, Qiqi ;

Cheng, Deqiang ;

Yao, Jie .

OPTIK, 2020, 202

[7]

Dc A., 2021, IMAGE VIS COMPUT, V114, DOI [10.1016/j.imavis.2021.104267, DOI 10.1016/J.IMAVIS.2021.104267]

[8] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[9] Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].

Garg, Ravi ;

VijayKumar, B. G. ;

Carneiro, Gustavo ;

Reid, Ian .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756

[10] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

← 1 2 3 4 →