MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation

被引:12
作者
Indyk, Ilia [1 ]
Makarov, Ilya [2 ]
机构
[1] HSE Univ, Moscow, Russia
[2] Artificial Intelligence Res Inst AIRI, AI Ctr NUST MISiS, Moscow, Russia
来源
2023 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, ISMAR | 2023年
关键词
Human-centered computing-Human computer interaction (HCI)-Interaction paradigms-Mixed / augmented reality; Artificial intelligence-Computer vision-Localization; spatial registration and tracking-3D reconstruction;
D O I
10.1109/ISMAR59233.2023.00138
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Depth estimation is crucial in various computer vision applications, including autonomous driving, robotics, and virtual and augmented reality. An accurate scene depth map is beneficial for localization, spatial registration, and tracking. It converts 2D images into precise 3D coordinates for accurate positioning, seamlessly aligns virtual and real objects in applications like AR, and enhances object tracking by distinguishing distances. The self-supervised monocular approach is particularly promising as it eliminates the need for complex and expensive data acquisition setups relying solely on a standard RGB camera. Recently, transformer-based architectures have become popular to solve this problem, but at high quality, they suffer from high computational cost and poor perception of small details as they focus more on global information. In this paper, we propose a novel fully convolutional network for monocular depth estimation, called MonoVAN, which incorporates the visual attention mechanism and applies super-resolution techniques in decoder to better capture fine-grained details in depth maps. To the best of our knowledge, this work pioneers the use of a convolutional visual attention in the context of depth estimation. Our experiments on outdoor KITTI benchmark and the indoor NYUv2 dataset show that our approach outperforms the most advanced self-supervised methods, including such state-of-the-art models as transformer-based VTDepth from ISMAR'22 and hybrid convolutional-transformer MonoFormer from AAAI'23, while having a comparable or even fewer number of parameters in our model than competitors. We also validate the impact of each proposed improvement in isolation, providing evidence of its significant contribution. Code and weights are available at https://github.com/IlyaInd/MonoVAN.
引用
收藏
页码:1211 / 1220
页数:10
相关论文
共 68 条
[11]  
Eigen D, 2014, ADV NEUR IN, V27
[12]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[13]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011
[14]  
Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, 10.48550/arXiv.1704.04861, DOI 10.48550/ARXIV.1704.04861]
[15]   Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].
Garg, Ravi ;
VijayKumar, B. G. ;
Carneiro, Gustavo ;
Reid, Ian .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756
[16]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[17]   Digging Into Self-Supervised Monocular Depth Estimation [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Firman, Michael ;
Brostow, Gabriel .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837
[18]   Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Brostow, Gabriel J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611
[19]   Neural Networks Compression for Language Modeling [J].
Grachev, Artem M. ;
Ignatov, Dmitry I. ;
Savchenko, Andrey V. .
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 :351-357
[20]  
Guizilini V., 2020, INT C LEARN REPR ICL