Joint Attention Mechanisms for Monocular Depth Estimation With Multi-Scale Convolutions and Adaptive Weight Adjustment

被引：11

作者：

Liu, Peng ^{[1
,2
,3
]}

Zhang, Zonghua ^{[1
,2
]}

Meng, Zhaozong ^{[2
]}

Gao, Nan ^{[2
]}

机构：

[1] Hebei Univ Technol, State Key Lab Reliabil & Intelligence Elect Equip, Tianjin 300130, Peoples R China

[2] Hebei Univ Technol, Sch Mech Engn, Tianjin 300130, Peoples R China

[3] Tangshan Univ, Sch Intelligence & Informat Engn, Tangshan 063000, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金;

关键词：

Estimation; Feature extraction; Training; Task analysis; Spatial resolution; Decoding; Aggregates; Monocular depth estimation; multi-scale convolutions; joint attention mechanisms; weight adjustment; IMAGE; NETWORKS;

D O I：

10.1109/ACCESS.2020.3030097

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Monocular depth estimation is a fundamental problem for various vision applications, and is therefore gaining increasing attention in the field of computer vision. Though a great improvement has been made thanks to the rapid progress of deep convolutional neural networks, depth estimation of the object at finer details remains an unsatisfactory issue, especially in complex scenes that has rich structure information. In this article, we proposed a deep end-to-end learning framework with the combination of multi-scale convolutions and joint attention mechanisms to tackle this challenge. Specifically, we firstly elaborately designed a lightweight up-convolution to generate multi-scale feature maps. Then we introduced an attention-based residual block to aggregate different feature maps in joint channel and spatial dimension, which could enhance the discriminant ability of feature fusion at finer details. Furthermore, we explored an effective adaptive weight adjustment strategy for the loss function to further improve the performance, which adjusts the weight of each loss term during training without additional hyper-parameters. The proposed framework was evaluated using challenging NYU Depth v2 and KITTI datasets. Experimental results demonstrated that the proposed approach is superior to most of the state-of-the-art methods.

引用

页码：184437 / 184450

页数：14

共 54 条

[1]

Alhashim I., 2018, CoRR

[2]

[Anonymous], 2018, ARXIV180501556

[3] Farm Workers of the Future Vision-Based Robotics for Broad-Acre Agriculture [J].

Ball, David ;

Ross, Patrick ;

English, Andrew ;

Milani, Peter ;

Richards, Daniel ;

Bate, Andrew ;

Upcroft, Ben ;

Wyeth, Gordon ;

Corke, Peter .

IEEE ROBOTICS & AUTOMATION MAGAZINE, 2017, 24 (03) :97-107

[4] Scene parsing using inference Embedded Deep Networks [J].

Bu, Shuhui ;

Han, Pengcheng ;

Liu, Zhenbao ;

Han, Junwei .

PATTERN RECOGNITION, 2016, 59 :188-198

[5] Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation [J].

Chen, Po-Yi ;

Liu, Alexander H. ;

Liu, Yen-Cheng ;

Wang, Yu-Chiang Frank .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2619-2627

[6]

Chen XT, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P694

[7]

Chen Y., 2019, ARXIV190110137

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Eigen D, 2014, ADV NEUR IN, V27

[10] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

← 1 2 3 4 5 6 →