Underwater Monocular Depth Estimation Based on Physical-Guided Transformer

被引：9

作者：

Wang, Chen ^{[1
]}

Xu, Haiyong ^{[1
]}

Jiang, Gangyi ^{[2
]}

Yu, Mei ^{[2
]}

Luo, Ting ^{[2
]}

Chen, Yeyao ^{[2
]}

机构：

[1] Ningbo Univ, Sch Math & Stat, Ningbo 315211, Peoples R China

[2] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

关键词：

Estimation; Transformers; Feature extraction; Decoding; Cameras; Task analysis; Geoscience and remote sensing; Physical-guided Transformer; physically inverted transmission maps; underwater monocular depth estimation; RESTORATION; ENHANCEMENT; NETWORK;

D O I：

10.1109/TGRS.2024.3373904

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Owing to the light absorption and wavelength scattering in underwater environments, underwater images are severely degraded, which directly affects the depth estimation of underwater scenes. Accurate underwater depth estimation is essential for representing and understanding underwater scenes. However, the existing underwater depth estimation methods have not fully taken into account the distinctive physical properties of underwater environments, which has resulted in increased bias and feature distortion in the depth estimation results. In this article, an underwater monocular depth estimation method based on physical-guided Transformer (UPGformer) is proposed, considering the characteristics of underwater imaging, including shallow feature extraction, encoding, decoding, and regression stages. Specifically, in the shallow feature extraction stage, considering the color deviation of underwater images and extracting richer primary features, an enrichment and extraction depth Transformer (EEDT) module is proposed, by interacting physically inverted transmission maps of the underwater dark channel prior (UDCP) with physical color-compensated underwater images through self-attention. In the encoding stage, considering the nonuniform degradation of underwater images (nonuniform local distortion and inconsistent channel degradation), the underwater physical Transformer interaction encoder (UPTE) module, which fuses the Transformer and physically inverted transmission maps, is proposed. Furthermore, in the decoding stage, to better recover features and reduce information loss, the underwater physical embedded decoding (UPED) module is proposed, which embeds the physically inverted transmission maps with the upsampling process. Finally, the depth map is constructed during the regression stage. The experimental results demonstrate that the proposed UPGformer outperforms existing methods, both qualitatively and quantitatively.

引用

页码：18 / 18

页数：1

共 70 条

[1] Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention [J].

Agarwal, Ashutosh ;

Arora, Chetan .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :5850-5859

[2]

Alhashim I, 2019, Arxiv, DOI arXiv:1812.11941

[3] Self-Supervised Monocular Depth Underwater [J].

Amitai, Shlomi ;

Klein, Itzik ;

Treibitz, Tali .

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, :1098-1104

[4]

Bae J, 2023, AAAI CONF ARTIF INTE, P187

[5] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[6] Robust scene reconstruction from an onmidirectional vision system [J].

Bunschoten, R ;

Kröse, B .

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2003, 19 (02) :351-357

[7] MFFN: An Underwater Sensing Scene Image Enhancement Method Based on Multiscale Feature Fusion Network [J].

Chen, Renzhang ;

Cai, Zhanchuan ;

Cao, Wei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[8] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[9] Color Channel Compensation (3C): A Fundamental Pre-Processing Step for Image Enhancement [J].

Codruta, Ancuti O. ;

Ancuti, Cosmin ;

De Vleeschouwer, Christophe ;

Sbert, Mateu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :2653-2665

[10] U2D2Net: Unsupervised Unified Image Dehazing and Denoising Network for Single Hazy Image Enhancement [J].

Ding, Bosheng ;

Zhang, Ruiheng ;

Xu, Lixin ;

Liu, Guanyu ;

Yang, Shuo ;

Liu, Yumeng ;

Zhang, Qi .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :202-217

← 1 2 3 4 5 6 7 →