Transformers in Unsupervised Structure-from-Motion

被引:0
|
作者
Chawla, Hemang [1 ,2 ]
Varma, Arnav [1 ]
Arani, Elahe [1 ,2 ]
Zonooz, Bahram [1 ,2 ]
机构
[1] NavInfo Europe, Adv Res Lab, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, Dept Math & Comp Sci, Eindhoven, Netherlands
关键词
Structure-from-motion; Monocular depth estimation; Monocular pose estimation; Camera calibration; Natural corruptions; Adversarial attacks; VISION;
D O I
10.1007/978-3-031-45725-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have revolutionized deep learning based computer vision with improved performance as well as robustness to natural corruptions and adversarial attacks. Transformers are used predominantly for 2D vision tasks, including image classification, semantic segmentation, and object detection. However, robots and advanced driver assistance systems also require 3D scene understanding for decision making by extracting structure-from-motion (SfM). We propose a robust transformer-based monocular SfM method that learns to predict monocular pixel-wise depth, ego vehicle's translation and rotation, as well as camera's focal length and principal point, simultaneously. With experiments on KITTI and DDAD datasets, we demonstrate how to adapt different vision transformers and compare them against contemporary CNN-based methods. Our study shows that transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust against natural corruptions, as well as untargeted and targeted attacks. (Code: https://github.com/NeurAI-Lab/MT-SfMLearner).
引用
收藏
页码:281 / 303
页数:23
相关论文
共 50 条
  • [31] Reliable structure-from-motion for image pairs
    Zuverlässiges Structure-From-Motion für Bildpaare
    Cheremukhin, S. (Sergey.Cheremukhin@FernUni-Hagen.de), 1600, Springer Verlag (36):
  • [32] A new theory of structure-from-motion perception
    Fernandez, Julian M.
    Farell, Bart
    JOURNAL OF VISION, 2009, 9 (11):
  • [33] Surface interpolation in structure-from-motion displays
    Bocheva, N.
    PERCEPTION, 1996, 25 : 123 - 123
  • [34] Global Structure-from-Motion by Similarity Averaging
    Cui, Zhaopeng
    Tan, Ping
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 864 - 872
  • [35] Refractive Structure-from-Motion on Underwater Images
    Jordt-Sedlazeck, Anne
    Koch, Reinhard
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 57 - 64
  • [36] STRUCTURE-FROM-MOTION UNDER ORTHOGRAPHIC PROJECTION
    HARRIS, C
    IMAGE AND VISION COMPUTING, 1991, 9 (05) : 329 - 332
  • [37] Segmentation in structure-from-motion: a computational approach
    Rubin, N.
    Caudek, C.
    PERCEPTION, 1998, 27 : 110 - 110
  • [38] Optimizing the Viewing Graph for Structure-from-Motion
    Sweeney, Chris
    Sattler, Torsten
    Hollerer, Tobias
    Turk, Matthew
    Pollefeys, Marc
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 801 - 809
  • [39] A PARALLEL IMPLEMENTATION OF A STRUCTURE-FROM-MOTION ALGORITHM
    WANG, H
    BOWMAN, C
    BRADY, M
    HARRIS, C
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 588 : 272 - 276
  • [40] Outliers Handling in the Structure-from-Motion Algorithm
    Zhou, Haoyin
    Zhang, Tao
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 164 - 167