Rich global feature guided network for monocular depth estimation

被引:8
作者
Wu, Bingyuan [1 ]
Wang, Yongxiong [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
基金
上海市自然科学基金;
关键词
Monocular depth estimation; Transformer; Large kernel convolution attention; Global feature;
D O I
10.1016/j.imavis.2022.104520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular depth estimation is a classical but challenging task in the field of computer vision. In recent years, Convolutional Neural Network (CNN) based models have been developed to estimate high-quality depth map from a single image. Most recently, some Transformer based models have led to great improvements. All the re-searchers are looking for a better way to handle the global processing of information which is crucial for depth relation inference but of high computational complexity. In this paper, we take advantage of both the Trans-former and CNN then propose a novel network architecture, called Rich Global Feature Guided Network (RGFN), with which rich global features are extracted from both encoder and decoder. The framework of the RGFN is the typical encoder-decoder for dense prediction. A hierarchical transformer is implemented as the en-coder to capture multi-scale contextual information and model long-range dependencies. In the decoder, the Large Kernel Convolution Attention (LKCA) is adopted to extract global features from different scales and guide the network to recover fine depth maps from low spatial resolution feature maps progressively. What's more, we apply the depth-specific data augmentation method, Vertical CutDepth, to boost the performance. Ex-perimental results on both the indoor and outdoor datasets demonstrate the superiority of the RGFN compared to other state-of-the-art models. Compared with the most recent method AdaBins, RGFN improves the RMSE score by 4.66% on the KITTI dataset and 4.67% on the NYU Depth v2 dataset.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:8
相关论文
共 34 条
[1]   AdaBins: Depth Estimation Using Adaptive Bins [J].
Bhat, Shariq Farooq ;
Alhashim, Ibraheem ;
Wonka, Peter .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017
[2]   ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM [J].
Campos, Carlos ;
Elvira, Richard ;
Gomez Rodriguez, Juan J. ;
Montiel, Jose M. M. ;
Tardos, Juan D. .
IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) :1874-1890
[3]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[4]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[5]   Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [J].
Chibane, Julian ;
Alldieck, Thiemo ;
Pons-Moll, Gerard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6968-6979
[6]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[7]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8]  
Eigen D, 2014, ADV NEUR IN, V27
[9]   Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [J].
Feng, Di ;
Haase-Schutz, Christian ;
Rosenbaum, Lars ;
Hertlein, Heinz ;
Glaser, Claudius ;
Timm, Fabian ;
Wiesbeck, Werner ;
Dietmayer, Klaus .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) :1341-1360
[10]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011