A Contour-Aware Monocular Depth Estimation Network Using Swin Transformer and Cascaded Multiscale Fusion

被引:1
作者
Li, Tao [1 ]
Zhang, Yi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Cascaded multiscale fusion; contour aware; monocular depth estimation; Swin Transformer;
D O I
10.1109/JSEN.2024.3370821
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolutional neural networks (CNNs) or rely on encoder-decoder frameworks of Transformers. However, they achieved less satisfactory results, especially around object contours. In this article, we propose a Transformer-based contour-aware depth estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multiscale fusion module to aggregate multilevel features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.
引用
收藏
页码:13620 / 13628
页数:9
相关论文
共 47 条
  • [11] Eigen D, 2014, ADV NEUR IN, V27
  • [12] Deep Ordinal Regression Network for Monocular Depth Estimation
    Fu, Huan
    Gong, Mingming
    Wang, Chaohui
    Batmanghelich, Kayhan
    Tao, Dacheng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2002 - 2011
  • [13] Vision meets robotics: The KITTI dataset
    Geiger, A.
    Lenz, P.
    Stiller, C.
    Urtasun, R.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
  • [14] Depth-Aware Generative Adversarial Network for Talking Head Video Generation
    Hong, Fa-Ting
    Zhang, Longhao
    Shen, Li
    Xu, Dan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3387 - 3396
  • [15] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
  • [16] Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries
    Hu, Junjie
    Ozay, Mete
    Zhang, Yan
    Okatani, Takayuki
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1043 - 1051
  • [17] Self-Supervised Monocular Depth Estimation Using Hybrid Transformer Encoder
    Hwang, Seung-Jun
    Park, Sung-Jun
    Baek, Joong-Hwan
    Kim, Byungkyu
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (19) : 18762 - 18770
  • [18] Kline J, 2020, PROCEEDINGS OF THE 2020 32ND INTERNATIONAL TELETRAFFIC CONGRESS (ITC 32), P1, DOI [10.1109/ITC3249928.2020.00009, 10.1007/978-3-030-58565-5_35]
  • [19] Semi-Supervised Deep Learning for Monocular Depth Map Prediction
    Kuznietsov, Yevhen
    Stuckle, Jorg
    Leibe, Bastian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2215 - 2223
  • [20] Deeper Depth Prediction with Fully Convolutional Residual Networks
    Laina, Iro
    Rupprecht, Christian
    Belagiannis, Vasileios
    Tombari, Federico
    Navab, Nassir
    [J]. PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, : 239 - 248