TFDEPTH: SELF-SUPERVISED MONOCULARDEPTH ESTIMATION WITH MULITI-SCALE SELECTIVE TRANSFORMER FEATURE FUSION

被引:0
作者
Hu, Hongli [1 ]
Miao, Jun [1 ,2 ]
Zhu, Guanghu [1 ]
Yan, Je [2 ]
Chu, Jun [3 ]
机构
[1] Nanchang Hangkong Univ, Sch Aeronaut Mfg Engn, Nanchang, Peoples R China
[2] Chinese Acad Sci, Key Lab Lunar & Deep Space Explorat, Beijing, Peoples R China
[3] Nanchang Hangkong Univ, Key Lab Jiangxi Prov Image Proc & Pattern Recognit, Nanchang 330063, Peoples R China
关键词
monocular depth estimation; multi-scale fusion; self-supervised learning; transformer;
D O I
10.105566/ias.2987
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Existing self -supervised models for monocular depth estimation suffer from issues such as discontinuity, blurred edges, and unclear contours, particularly for small objects. We propose a self -supervised monocular depth estimation network with multi -scale selective Transformer feature fusion. To preserve more detailed features, this paper constructs a multi -scale encoder to extract features and leverages the self -attention mechanism of Transformer to capture global contextual information, enabling better depth prediction for small objects. Additionally, the multi -scale selective fusion module (MSSF) is also proposed, which can make full use of multi -scale feature information in the decoding part and perform selective fusion step by step, which can effectively eliminate noise and retain local detail features to obtain a clear depth map with clear edges. Experimental evaluations on the KITTI dataset demonstrate that the proposed algorithm achieves an absolute relative error (Abs Rel) of 0.098 and an accuracy rate (delta) of 0.983. The results indicate that the proposed algorithm not only estimates depth values with high accuracy but also predicts the continuous depth map with clear edges.
引用
收藏
页码:139 / 149
页数:11
相关论文
共 50 条
  • [31] Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion
    Chang, Qi
    Higgins, William E.
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [32] Self-Supervised Real-World Image Denoising Based on Multi-Scale Feature Enhancement and Attention Fusion
    Tang, Hailiang
    Zhang, Wenxiao
    Zhu, Hailin
    Zhao, Ke
    IEEE ACCESS, 2024, 12 : 49720 - 49734
  • [33] A feature-level mask self-supervised assisted learning approach based on transformer for remaining useful life prediction
    Xue, Bing
    Gao, Xin
    Zhang, Shuwei
    Wang, Ning
    Fu, Shiyuan
    Yu, Jiahao
    Zhang, Guangyao
    Huang, Zijian
    INTELLIGENT DATA ANALYSIS, 2024, 28 (01) : 217 - 237
  • [34] SPDET: Edge-Aware Self-Supervised Panoramic Depth Estimation Transformer With Spherical Geometry
    Zhuang, Chuanqing
    Lu, Zhengda
    Wang, Yiqun
    Xiao, Jun
    Wang, Ying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12474 - 12489
  • [35] Self-supervised 3D face reconstruction based on multi-scale feature fusion and dual attention mechanism
    Zhou D.-K.
    Zhang C.
    Yang X.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (10): : 2428 - 2437
  • [36] Dual representations: A novel variant of Self-Supervised Audio Spectrogram Transformer with multi-layer feature fusion and pooling combinations for sound classification
    Choi, Hyosun
    Zhang, Li
    Watkins, Chris
    NEUROCOMPUTING, 2025, 623
  • [37] Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformer
    Sun, Guodong
    Liu, Junjie
    Liu, Mingxuan
    Liu, Moyun
    Zhang, Yang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [38] Personvit: large-scale self-supervised vision transformer for person re-identification
    Hu, Bin
    Wang, Xinggang
    Liu, Wenyu
    MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
  • [39] Self-Supervised Representation Learning and Temporal-Spectral Feature Fusion for Bed Occupancy Detection
    Song, Yingjian
    Pitafi, Zaid Farooq
    Dou, Fei
    Sun, Jin
    Zhang, Xiang
    Phillips, Bradley G.
    Song, Wenzhan
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2024, 8 (03):
  • [40] Self-supervised multi-modal feature fusion for predicting early recurrence of hepatocellular carcinoma
    Wang, Sen
    Zhao, Ying
    Li, Jiayi
    Yi, Zongmin
    Li, Jun
    Zuo, Can
    Yao, Yu
    Liu, Ailian
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 118