A Contour-Aware Monocular Depth Estimation Network Using Swin Transformer and Cascaded Multiscale Fusion

被引:1
作者
Li, Tao [1 ]
Zhang, Yi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Cascaded multiscale fusion; contour aware; monocular depth estimation; Swin Transformer;
D O I
10.1109/JSEN.2024.3370821
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolutional neural networks (CNNs) or rely on encoder-decoder frameworks of Transformers. However, they achieved less satisfactory results, especially around object contours. In this article, we propose a Transformer-based contour-aware depth estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multiscale fusion module to aggregate multilevel features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.
引用
收藏
页码:13620 / 13628
页数:9
相关论文
共 47 条
  • [1] Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting
    Abdulwahab, Saddam
    Rashwan, Hatem A.
    Garcia, Miguel Angel
    Masoumian, Armin
    Puig, Domenec
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (19) : 16423 - 16440
  • [2] AdaBins: Depth Estimation Using Adaptive Bins
    Bhat, Shariq Farooq
    Alhashim, Ibraheem
    Wonka, Peter
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4008 - 4017
  • [3] Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction
    Boulahbal, Houssem Eddine
    Voicila, Adrian
    Comport, Andrew, I
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10962 - 10968
  • [4] Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks
    Cao, Yuanzhouhan
    Wu, Zifeng
    Shen, Chunhua
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (11) : 3174 - 3182
  • [5] From Big to Small: Adaptive Learning to Partial-Set Domains
    Cao, Zhangjie
    You, Kaichao
    Zhang, Ziyang
    Wang, Jianmin
    Long, Mingsheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1766 - 1780
  • [6] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
    Chen, Xiaotian
    Wang, Yuwang
    Chen, Xuejin
    Zeng, Wenjun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3033 - 3042
  • [7] Attention-based context aggregation network for monocular depth estimation
    Chen, Yuru
    Zhao, Haitao
    Hu, Zhengwei
    Peng, Jingchao
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (06) : 1583 - 1596
  • [8] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
    Cheng, Zeyu
    Zhang, Yi
    Tang, Chengkai
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (23) : 26912 - 26920
  • [9] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929