Deep Ordinal Regression Network for Monocular Depth Estimation

被引:1219
作者
Fu, Huan [1 ]
Gong, Mingming [2 ,3 ]
Wang, Chaohui [4 ]
Batmanghelich, Kayhan [2 ]
Tao, Dacheng [1 ]
机构
[1] Univ Sydney, FEIT, SIT, UBTECH Sydney AI Ctr, Sydney, NSW, Australia
[2] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA 15260 USA
[3] Carnegie Mellon Univ, Dept Philosophy, Pittsburgh, PA 15213 USA
[4] Univ Paris Est, UPEM, ESIEE Paris, LIGM,CNRS,ENPC,UMR 8049, Marne La Vallee, France
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
基金
澳大利亚研究理事会;
关键词
IMAGE;
D O I
10.1109/CVPR.2018.00214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multilayer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and faster convergence in synch. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI [16], Make3D [49], and NYU Depth v2 [41], and outperforms existing methods by a large margin.
引用
收藏
页码:2002 / 2011
页数:10
相关论文
共 63 条
  • [31] Learning-Based, Automatic 2D-to-3D Image and Video Conversion
    Konrad, Janusz
    Wang, Meng
    Ishwar, Prakash
    Wu, Chen
    Mukherjee, Debargha
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (09) : 3485 - 3496
  • [32] Semi-Supervised Deep Learning for Monocular Depth Map Prediction
    Kuznietsov, Yevhen
    Stuckle, Jorg
    Leibe, Bastian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2215 - 2223
  • [33] Pulling Things out of Perspective
    Ladicky, L'ubor
    Shi, Jianbo
    Pollefeys, Marc
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 89 - 96
  • [34] Deeper Depth Prediction with Fully Convolutional Residual Networks
    Laina, Iro
    Rupprecht, Christian
    Belagiannis, Vasileios
    Tombari, Federico
    Navab, Nassir
    [J]. PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, : 239 - 248
  • [35] Li B, 2015, PROC CVPR IEEE, P1119, DOI 10.1109/CVPR.2015.7298715
  • [36] Perceptual Generative Adversarial Networks for Small Object Detection
    Li, Jianan
    Liang, Xiaodan
    Wei, Yunchao
    Xu, Tingfa
    Feng, Jiashi
    Yan, Shuicheng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1951 - 1959
  • [37] Li X., 2014, ACCV
  • [38] Single Image Depth Estimation From Predicted Semantic Labels
    Liu, Beyang
    Gould, Stephen
    Koller, Daphne
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 1253 - 1260
  • [39] Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields
    Liu, Fayao
    Shen, Chunhua
    Lin, Guosheng
    Reid, Ian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2024 - 2039
  • [40] Discrete-Continuous Depth Estimation from a Single Image
    Liu, Miaomiao
    Salzmann, Mathieu
    He, Xuming
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 716 - 723