Transformers in Unsupervised Structure-from-Motion

被引：0

作者：

Chawla, Hemang ^{[1
,2
]}

Varma, Arnav ^{[1
]}

Arani, Elahe ^{[1
,2
]}

Zonooz, Bahram ^{[1
,2
]}

机构：

[1] NavInfo Europe, Adv Res Lab, Eindhoven, Netherlands

[2] Eindhoven Univ Technol, Dept Math & Comp Sci, Eindhoven, Netherlands

来源：

COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2022 | 2023年 / 1815卷

关键词：

Structure-from-motion; Monocular depth estimation; Monocular pose estimation; Camera calibration; Natural corruptions; Adversarial attacks; VISION;

D O I：

10.1007/978-3-031-45725-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have revolutionized deep learning based computer vision with improved performance as well as robustness to natural corruptions and adversarial attacks. Transformers are used predominantly for 2D vision tasks, including image classification, semantic segmentation, and object detection. However, robots and advanced driver assistance systems also require 3D scene understanding for decision making by extracting structure-from-motion (SfM). We propose a robust transformer-based monocular SfM method that learns to predict monocular pixel-wise depth, ego vehicle's translation and rotation, as well as camera's focal length and principal point, simultaneously. With experiments on KITTI and DDAD datasets, we demonstrate how to adapt different vision transformers and compare them against contemporary CNN-based methods. Our study shows that transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust against natural corruptions, as well as untargeted and targeted attacks. (Code: https://github.com/NeurAI-Lab/MT-SfMLearner).

引用

页码：281 / 303

页数：23

共 50 条

[31] Reliable structure-from-motion for image pairs
Zuverlässiges Structure-From-Motion für Bildpaare
Cheremukhin, S. (Sergey.Cheremukhin@FernUni-Hagen.de), 1600, Springer Verlag (36):
[32] A new theory of structure-from-motion perception
Fernandez, Julian M.
Farell, Bart
JOURNAL OF VISION, 2009, 9 (11):
[33] Surface interpolation in structure-from-motion displays
Bocheva, N.
PERCEPTION, 1996, 25 : 123 - 123
[34] Global Structure-from-Motion by Similarity Averaging
Cui, Zhaopeng
Tan, Ping
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 864 - 872
[35] Refractive Structure-from-Motion on Underwater Images
Jordt-Sedlazeck, Anne
Koch, Reinhard
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 57 - 64
[36] STRUCTURE-FROM-MOTION UNDER ORTHOGRAPHIC PROJECTION
HARRIS, C
IMAGE AND VISION COMPUTING, 1991, 9 (05) : 329 - 332
[37] Segmentation in structure-from-motion: a computational approach
Rubin, N.
Caudek, C.
PERCEPTION, 1998, 27 : 110 - 110
[38] Optimizing the Viewing Graph for Structure-from-Motion
Sweeney, Chris
Sattler, Torsten
Hollerer, Tobias
Turk, Matthew
Pollefeys, Marc
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 801 - 809
[39] A PARALLEL IMPLEMENTATION OF A STRUCTURE-FROM-MOTION ALGORITHM
WANG, H
BOWMAN, C
BRADY, M
HARRIS, C
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 588 : 272 - 276
[40] Outliers Handling in the Structure-from-Motion Algorithm
Zhou, Haoyin
Zhang, Tao
2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 164 - 167

← 1 2 3 4 5 →