Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation

被引:22
作者
Ruhkamp, Patrick [1 ]
Gao, Daoyi [1 ]
Chen, Hanzhi [1 ]
Navab, Nassir [1 ]
Busam, Beniamin [1 ]
机构
[1] Tech Univ Munich, Munich, Germany
来源
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021) | 2021年
关键词
D O I
10.1109/3DV53792.2021.00092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inferring geometrically consistent dense 3D scenes across a tuple of temporally consecutive images remains challenging for self-supervised monocular depth prediction pipelines. This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy. We propose a spatial attention module that correlates coarse depth predictions to aggregate local geometric information. A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images. Additionally, we introduce geometric constraints between frames regularized by photometric cycle consistency. By combining our proposed regularization and the novel spatial-temporal-attention module we fully leverage both the geometric and appearance-based consistency across monocular frames. This yields geometrically meaningful attention and improves temporal depth stability and accuracy compared to previous methods.
引用
收藏
页码:837 / 847
页数:11
相关论文
共 55 条
  • [1] Babu VM, 2018, IEEE INT C INT ROBOT, P1082, DOI 10.1109/IROS.2018.8593864
  • [2] Bian JW, 2019, ADV NEUR IN, V32
  • [3] Busam B., 2019, P IEEE CVF INT C COM
  • [4] Markerless Inside-Out Tracking for 3D Ultrasound Compounding
    Busam, Benjamin
    Ruhkamp, Patrick
    Virga, Salvatore
    Lentes, Beatrice
    Rackerseder, Julia
    Navab, Nassir
    Hennersperger, Christoph
    [J]. SIMULATION, IMAGE PROCESSING, AND ULTRASOUND SYSTEMS FOR ASSISTED DIAGNOSIS AND NAVIGATION, 2018, 11042 : 56 - 64
  • [5] Camera Pose Filtering with Local Regression Geodesics on the Riemannian Manifold of Dual Quaternions
    Busam, Benjamin
    Birdal, Tolga
    Navab, Nassir
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2436 - 2445
  • [6] Casser V, 2019, AAAI CONF ARTIF INTE, P8001
  • [7] Chang Shu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12364), P572, DOI 10.1007/978-3-030-58529-7_34
  • [8] Choi S, 2015, PROC CVPR IEEE, P5556, DOI 10.1109/CVPR.2015.7299195
  • [9] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [10] Dosovitskiy A., 2020, P ICLR