Transformers in Unsupervised Structure-from-Motion

被引:0
|
作者
Chawla, Hemang [1 ,2 ]
Varma, Arnav [1 ]
Arani, Elahe [1 ,2 ]
Zonooz, Bahram [1 ,2 ]
机构
[1] NavInfo Europe, Adv Res Lab, Eindhoven, Netherlands
[2] Eindhoven Univ Technol, Dept Math & Comp Sci, Eindhoven, Netherlands
关键词
Structure-from-motion; Monocular depth estimation; Monocular pose estimation; Camera calibration; Natural corruptions; Adversarial attacks; VISION;
D O I
10.1007/978-3-031-45725-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have revolutionized deep learning based computer vision with improved performance as well as robustness to natural corruptions and adversarial attacks. Transformers are used predominantly for 2D vision tasks, including image classification, semantic segmentation, and object detection. However, robots and advanced driver assistance systems also require 3D scene understanding for decision making by extracting structure-from-motion (SfM). We propose a robust transformer-based monocular SfM method that learns to predict monocular pixel-wise depth, ego vehicle's translation and rotation, as well as camera's focal length and principal point, simultaneously. With experiments on KITTI and DDAD datasets, we demonstrate how to adapt different vision transformers and compare them against contemporary CNN-based methods. Our study shows that transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust against natural corruptions, as well as untargeted and targeted attacks. (Code: https://github.com/NeurAI-Lab/MT-SfMLearner).
引用
收藏
页码:281 / 303
页数:23
相关论文
共 50 条
  • [1] Unsupervised Progressive Structure-from-Motion for Unordered Images
    Du, Yingkui
    Fan, Baojie
    Tang, Yandong
    Han, Jianda
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2012), 2012,
  • [2] Learning Structure-from-Motion from Motion
    Pinard, Clement
    Chevalley, Laure
    Manzanera, Antoine
    Filliat, David
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT III, 2019, 11131 : 363 - 376
  • [3] Structure-from-Motion Revisited
    Schonberger, Johannes L.
    Frahm, Jan -Michael
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4104 - 4113
  • [4] DETECTION OF NONRIGIDITY IN STRUCTURE-FROM-MOTION
    JEON, K
    BRAUNSTEIN, ML
    HOFFMAN, DD
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1995, 36 (04) : S361 - S361
  • [5] DETECTION OF SURFACES IN STRUCTURE-FROM-MOTION
    TURNER, J
    BRAUNSTEIN, ML
    ANDERSEN, GJ
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1995, 21 (04) : 809 - 821
  • [6] THE FST STRUCTURE-FROM-MOTION ILLUSION
    BRADSHAW, MF
    FRISBY, JP
    MAYHEW, JEW
    PERCEPTION, 1988, 17 (03) : 418 - 418
  • [7] QUALITATIVE CONSTRAINTS FOR STRUCTURE-FROM-MOTION
    THOMPSON, WB
    PAINTER, JS
    CVGIP-IMAGE UNDERSTANDING, 1992, 56 (01): : 69 - 77
  • [8] A critique of structure-from-motion algorithms
    Oliensis, J
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2000, 80 (02) : 172 - 214
  • [9] A new structure-from-motion ambiguity
    Oliensis, J
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) : 685 - 700
  • [10] ACCURACY OF STRUCTURE-FROM-MOTION INTERPOLATION
    SAIDPOUR, A
    BRAUNSTEIN, ML
    HOFFMAN, DD
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1991, 32 (04) : 1277 - 1277