LocalBins: Improving Depth Estimation by Learning Local Distributions

被引:59
作者
Bhat, Shariq Farooq [1 ]
Alhashim, Ibraheem [2 ]
Wonka, Peter [1 ]
机构
[1] KAUST, Thuwal, Saudi Arabia
[2] Saudi Data & Artificial Intelligence Authority SD, Natl Ctr Artificial Intelligence NCAI, Riyadh, Saudi Arabia
来源
COMPUTER VISION - ECCV 2022, PT I | 2022年 / 13661卷
关键词
Single image depth estimation; Encoder-decoder architecture; Deep learning; Dense regression; Histogram prediction;
D O I
10.1007/978-3-031-19769-7_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks. We build on AdaBins which estimates a global distribution of depth values for the input image and evolve the architecture in two ways. First, instead of predicting global depth distributions, we predict depth distributions of local neighborhoods at every pixel. Second, instead of predicting depth distributions only towards the end of the decoder, we involve all layers of the decoder. We call this new architecture LocalBins. Our results demonstrate a clear improvement over the state-of-the-art in all metrics on the NYU-Depth V2 dataset. Code and pretrained models will be made publicly available (https://github.com/sharigfarooq123/LocalBins).
引用
收藏
页码:480 / 496
页数:17
相关论文
共 40 条
[31]   The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth [J].
Watson, Jamie ;
Mac Aodha, Oisin ;
Prisacariu, Victor ;
Brostow, Gabriel ;
Firman, Michael .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1164-1174
[32]   Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks [J].
Xie, Junyuan ;
Girshick, Ross ;
Farhadi, Ali .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :842-857
[33]   Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation [J].
Xu, Dan ;
Wang, Wei ;
Tang, Hao ;
Liu, Hong ;
Sebe, Nicu ;
Ricci, Elisa .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3917-3925
[34]   Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation [J].
Xu, Dan ;
Ricci, Elisa ;
Ouyang, Wanli ;
Wang, Xiaogang ;
Sebe, Nicu .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :161-169
[35]   Enforcing geometric constraints of virtual normal for depth prediction [J].
Yin, Wei ;
Liu, Yifan ;
Shen, Chunhua ;
Yan, Youliang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5683-5692
[36]  
Yuan W., 2022, arXiv
[37]   Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation [J].
Zhao, Shanshan ;
Fu, Huan ;
Gong, Mingming ;
Tao, Dacheng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9780-9790
[38]   Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation [J].
Zhao, Yunhan ;
Kong, Shu ;
Shin, Daeyun ;
Fowlkes, Charless .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3327-3337
[39]  
Zhou H., 2021, BRIT MACHINE VISION
[40]   Unsupervised Learning of Depth and Ego-Motion from Video [J].
Zhou, Tinghui ;
Brown, Matthew ;
Snavely, Noah ;
Lowe, David G. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6612-+