A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

被引：0

作者：

Wang, Shuling ^{[1
]}

Jiang, Fengze ^{[1
]}

Gong, Xiaojin ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 19期

关键词：

depth completion; dual-attention fusion module; multi-scale dual branch; NETWORK; PROPAGATION;

D O I：

10.3390/s24196270

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Depth information is crucial for perceiving three-dimensional scenes. However, depth maps captured directly by depth sensors are often incomplete and noisy, our objective in the depth-completion task is to generate dense and accurate depth maps from sparse depth inputs by fusing guidance information from corresponding color images obtained from camera sensors. To address these challenges, we introduce transformer models, which have shown great promise in the field of vision, into the task of image-guided depth completion. By leveraging the self-attention mechanism, we propose a novel network architecture that effectively meets these requirements of high accuracy and resolution in depth data. To be more specific, we design a dual-branch model with a transformer-based encoder that serializes image features into tokens step by step and extracts multi-scale pyramid features suitable for pixel-wise dense prediction tasks. Additionally, we incorporate a dual-attention fusion module to enhance the fusion between the two branches. This module combines convolution-based spatial and channel-attention mechanisms, which are adept at capturing local information, with cross-attention mechanisms that excel at capturing long-distance relationships. Our model achieves state-of-the-art performance on both the NYUv2 depth and SUN-RGBD depth datasets. Additionally, our ablation studies confirm the effectiveness of the designed modules.

引用

页数：21

共 36 条

[1] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
Zhang, Menghui
Zhang, Yuchen
Liu, Shuaibing
Han, Yahui
Cao, Honggang
Qiao, Bingbing
SCIENTIFIC REPORTS, 2024, 14 (01):
[2] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
Wasi, Abdul
Shiney, O. Jeba
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
[3] OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
Yuan, Jianye
Li, Song
MATHEMATICS, 2023, 11 (24)
[4] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
Liu, Peng
Zhang, Zonghua
Meng, Zhaozong
Gao, Nan
NEUROCOMPUTING, 2025, 620
[5] A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer
Cui, Jianguo
Wang, Liejun
Jiang, Shaochen
APPLIED SCIENCES-BASEL, 2023, 13 (19):
[6] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
Ma, Junteng
Qin, Shihao
Su, Lan
Li, Xia
Xiao, Lixian
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
[7] Image Segmentation of Retinal Blood Vessels Based on Dual-Attention Multiscale Feature Fusion
Gao, Jixun
Huang, Quanzhen
Gao, Zhendong
Chen, Suxia
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[8] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
Luo, Ting
Wu, Jun
He, Zhouyan
Xu, Haiyong
Jiang, Gangyi
Chang, Chin-Chen
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 18
[9] An effective transformer based on dual attention fusion for underwater image enhancement
Hu, Xianjie
Liu, Jing
Li, Heng
Liu, Hui
Xue, Xiaojun
PEERJ COMPUTER SCIENCE, 2024, 10
[10] An effective transformer based on dual attention fusion for underwater image enhancement
Hu X.
Liu J.
Li H.
Liu H.
Xue X.
PeerJ Computer Science, 2024, 10

← 1 2 3 4 →