Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

被引:19
作者
Chen, Tian [1 ]
An, Shijie [1 ]
Zhang, Yuan [1 ]
Ma, Chongyang [1 ]
Wang, Huayan [1 ]
Guo, Xiaoyan [1 ]
Zheng, Wen [1 ]
机构
[1] Kuaishou Technol, Y Tech, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XIV | 2020年 / 12359卷
关键词
D O I
10.1007/978-3-030-58568-6_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular depth estimation plays a crucial role in 3D recognition and understanding. One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects. First, to exploit the spatial relationship of visual features, we propose a structure-aware neural network with spatial attention blocks. These blocks guide the network attention to global structures or local details across different feature layers. Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction, and explicitly increase the penalty on errors in depth-wise discontinuous regions, which helps preserve the sharpness of estimation results. Finally, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes, such as special lighting conditions, dynamic objects, and tilted camera angles. The new dataset is leveraged by an informed learning curriculum that mixes training examples incrementally to handle diverse data distributions. Experimental results show that our method outperforms state-of-the-art approaches by a large margin in terms of both prediction accuracy on NYUDv2 dataset and generalization performance on unseen datasets.
引用
收藏
页码:90 / 108
页数:19
相关论文
共 58 条
[1]  
Alhashim I, 2019, Arxiv, DOI arXiv:1812.11941
[2]   Canny edge detection enhancement by scale multiplication [J].
Bao, P ;
Zhang, L ;
Wu, XL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (09) :1485-U1
[3]  
Bengio Y, 2009, INT C MACHINE LEARNI, P41, DOI [DOI 10.1145/1553374.1553380, 10.1145/1553374.1553380]
[4]  
Cao Y, 2019, Arxiv, DOI arXiv:1904.11492
[5]  
Chen WF, 2016, ADV NEUR IN, V29
[6]   All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification [J].
Chen, Weijie ;
Xie, Di ;
Zhang, Yuan ;
Pu, Shiliang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7234-7243
[7]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[8]  
Eigen D, 2014, ADV NEUR IN, V27
[9]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[10]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011