A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

被引:0
|
作者
Wang, Shuling [1 ]
Jiang, Fengze [1 ]
Gong, Xiaojin [1 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China
关键词
depth completion; dual-attention fusion module; multi-scale dual branch; NETWORK; PROPAGATION;
D O I
10.3390/s24196270
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Depth information is crucial for perceiving three-dimensional scenes. However, depth maps captured directly by depth sensors are often incomplete and noisy, our objective in the depth-completion task is to generate dense and accurate depth maps from sparse depth inputs by fusing guidance information from corresponding color images obtained from camera sensors. To address these challenges, we introduce transformer models, which have shown great promise in the field of vision, into the task of image-guided depth completion. By leveraging the self-attention mechanism, we propose a novel network architecture that effectively meets these requirements of high accuracy and resolution in depth data. To be more specific, we design a dual-branch model with a transformer-based encoder that serializes image features into tokens step by step and extracts multi-scale pyramid features suitable for pixel-wise dense prediction tasks. Additionally, we incorporate a dual-attention fusion module to enhance the fusion between the two branches. This module combines convolution-based spatial and channel-attention mechanisms, which are adept at capturing local information, with cross-attention mechanisms that excel at capturing long-distance relationships. Our model achieves state-of-the-art performance on both the NYUv2 depth and SUN-RGBD depth datasets. Additionally, our ablation studies confirm the effectiveness of the designed modules.
引用
收藏
页数:21
相关论文
共 36 条
  • [1] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
    Zhang, Menghui
    Zhang, Yuchen
    Liu, Shuaibing
    Han, Yahui
    Cao, Honggang
    Qiao, Bingbing
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [2] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
    Wasi, Abdul
    Shiney, O. Jeba
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
  • [3] OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
    Yuan, Jianye
    Li, Song
    MATHEMATICS, 2023, 11 (24)
  • [4] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
    Liu, Peng
    Zhang, Zonghua
    Meng, Zhaozong
    Gao, Nan
    NEUROCOMPUTING, 2025, 620
  • [5] A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer
    Cui, Jianguo
    Wang, Liejun
    Jiang, Shaochen
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [6] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
    Ma, Junteng
    Qin, Shihao
    Su, Lan
    Li, Xia
    Xiao, Lixian
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
  • [7] Image Segmentation of Retinal Blood Vessels Based on Dual-Attention Multiscale Feature Fusion
    Gao, Jixun
    Huang, Quanzhen
    Gao, Zhendong
    Chen, Suxia
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [8] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
    Luo, Ting
    Wu, Jun
    He, Zhouyan
    Xu, Haiyong
    Jiang, Gangyi
    Chang, Chin-Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 18
  • [9] An effective transformer based on dual attention fusion for underwater image enhancement
    Hu, Xianjie
    Liu, Jing
    Li, Heng
    Liu, Hui
    Xue, Xiaojun
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [10] An effective transformer based on dual attention fusion for underwater image enhancement
    Hu X.
    Liu J.
    Li H.
    Liu H.
    Xue X.
    PeerJ Computer Science, 2024, 10