Dual-branch Monocular Depth Estimation Method with Attention Mechanism

被引：0

作者：

Zhou, Chengying ^{[1
]}

He, Lixin ^{[2
]}

Wang, Handong ^{[2
]}

Cheng, Zhi ^{[2
]}

Yang, Jing ^{[2
]}

Cao, Shenjie ^{[2
]}

机构：

[1] Anhui Jianzhu Univ, Sch Elect & Informat Engn, Hefei, Peoples R China

[2] Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei, Peoples R China

来源：

2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024 | 2024年

关键词：

attention mechanism; multi-scale feature extraction; monocular depth estimation;

D O I：

10.1109/ICETIS61828.2024.10593699

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In order to solve the problem of fuzzy edge of near objects and fuzzy and inaccurate depth of local objects, a two-branch monocular depth estimation method with fused attention mechanism is proposed in this paper. In this method, the idea of classification regression is adopted. Transformer branch is used to fully extract the global remote context relationship, and CNN branch is used to process the dense texture area to extract local features. A feature fusion module based on cross-attention is designed to fully integrate the features of both branches and self-adaptively divide the depth interval to obtain the final depth map. At the same time, in order to make up for the inherent shortcomings of the two-branch structure baseline network, a dense void convolution pyramid module is proposed to further extract multi-scale features, and a parallel channel and location attention module is proposed to establish channel correlation and location correlation. Two public datasets, KITTI and NYU-V2, were tested, and the accuracy of the lowest threshold values reached 97.6% and 92.7%, respectively. The qualitative and quantitative results show that this method can effectively solve the problem of local object depth ambiguity and obtain high-quality scene depth map.

引用

页码：421 / 426

页数：6

共 31 条

[1] DEPTHFORMER: MULTISCALE VISION TRANSFORMER FOR MONOCULAR DEPTH ESTIMATION WITH GLOBAL LOCAL INFORMATION FUSION [J].

Agarwal, Ashutosh ;

Arora, Chetan .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :3873-3877

[2]

Bae J, 2023, AAAI CONF ARTIF INTE, P187

[3] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[4] Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes [J].

Chen, Xuejin ;

Chen, Xiaotian ;

Zhang, Yiteng ;

Fu, Xueyang ;

Zha, Zheng-Jun .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (11) :5034-5046

[5]

Eigen D, 2014, ADV NEUR IN, V27

[6] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[7] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[8] Learning Depth From Single Images With Deep Neural Network Embedding Focal Length [J].

He, Lei ;

Wang, Guanghui ;

Hu, Zhanyi .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) :4676-4689

[9] Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries [J].

Hu, Junjie ;

Ozay, Mete ;

Zhang, Yan ;

Okatani, Takayuki .

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1043-1051

[10]

Iandola F, 2014, Arxiv, DOI arXiv:1404.1869

← 1 2 3 4 →