High quality monocular depth estimation with parallel decoder

被引:0
作者
Jiatao Liu
Yaping Zhang
机构
[1] Yunnan Normal University,School of Information Science and Technology
来源
Scientific Reports | / 12卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Monocular depth estimation aims to recover the depth information in three-dimensional (3D) space from a single image efficiently, but it is an ill-posed problem. Recently, Transformer-based architectures have achieved excellent accuracy in monocular depth estimation. However, due to the characteristics of Transformer, the model parameters are huge and the inference speed is slow. In traditional convolutional neural network–based architectures, many encoder-decoders perform serial fusion of the multi-scale features of each stage of the encoder and then output predictions. However, in these approaches it may be difficult to recover the spatial information lost by the encoder during pooling and convolution. To enhance this serial structure, we propose a structure from the decoder perspective, which first predicts global and local depth information in parallel and then fuses them. Results show that this structure is an effective improvement over traditional methods and has accuracy comparable with that of state-of-the-art methods in both indoor and outdoor scenes, but with fewer parameters and computations. Moreover, results of ablation studies verify the effectiveness of the proposed decoder.
引用
收藏
相关论文
共 17 条
  • [1] Liu F(2021)Automatic modulation recognition based on CNN and GRU Tsinghua Sci. Technol. 27 422-431
  • [2] Zhang Z(2021)CNN and MLP neural network ensembles for packet classification and adversary defense Intell. Converged Netw. 2 66-82
  • [3] Zhou R(2019)Deep learning and transfer learning approaches for image classification Int. J. Recent Technol. Eng. (IJRTE) 7 427-432
  • [4] Hartpence B(2019)Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks ISPRS J. Photogramm. Remote Sens. 150 59-69
  • [5] Kwasinski A(2013)Vision meets robotics: The kitti dataset Int. J. Robot. Res. 32 1231-1237
  • [6] Krishna ST(2019)Pytorch: An imperative style, high-performance deep learning library Adv. Neural. Inf. Process. Syst. 32 8026-8037
  • [7] Kalluri HK(undefined)undefined undefined undefined undefined-undefined
  • [8] Wurm M(undefined)undefined undefined undefined undefined-undefined
  • [9] Stark T(undefined)undefined undefined undefined undefined-undefined
  • [10] Zhu XX(undefined)undefined undefined undefined undefined-undefined