Transformers in Unsupervised Structure-from-Motion

被引：0

作者：

Chawla, Hemang ^{[1
,2
]}

Varma, Arnav ^{[1
]}

Arani, Elahe ^{[1
,2
]}

Zonooz, Bahram ^{[1
,2
]}

机构：

[1] NavInfo Europe, Adv Res Lab, Eindhoven, Netherlands

[2] Eindhoven Univ Technol, Dept Math & Comp Sci, Eindhoven, Netherlands

来源：

COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2022 | 2023年 / 1815卷

关键词：

Structure-from-motion; Monocular depth estimation; Monocular pose estimation; Camera calibration; Natural corruptions; Adversarial attacks; VISION;

D O I：

10.1007/978-3-031-45725-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have revolutionized deep learning based computer vision with improved performance as well as robustness to natural corruptions and adversarial attacks. Transformers are used predominantly for 2D vision tasks, including image classification, semantic segmentation, and object detection. However, robots and advanced driver assistance systems also require 3D scene understanding for decision making by extracting structure-from-motion (SfM). We propose a robust transformer-based monocular SfM method that learns to predict monocular pixel-wise depth, ego vehicle's translation and rotation, as well as camera's focal length and principal point, simultaneously. With experiments on KITTI and DDAD datasets, we demonstrate how to adapt different vision transformers and compare them against contemporary CNN-based methods. Our study shows that transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust against natural corruptions, as well as untargeted and targeted attacks. (Code: https://github.com/NeurAI-Lab/MT-SfMLearner).

引用

页码：281 / 303

页数：23

共 50 条

[21] Reliable structure-from-motion for image pairs [Zuverlässiges Structure-From-Motion für Bildpaare]
Cheremukhin S.
Informatik-Spektrum, 2013, 36 (4) : 382 - 388
[22] A tolerance analysis for structure-from-motion stimuli
Hogervorst, M. A.
Kappers, A. M. L.
Koenderink, J. J.
PERCEPTION, 1994, 23 : 37 - 38
[23] LiDAR-enhanced Structure-from-Motion
Zhen, Weikun
Hu, Yaoyu
Yu, Huai
Scherer, Sebastian
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6773 - 6779
[24] Practical algorithms for stratified structure-from-motion
Chen, GQ
Medioni, GG
IMAGE AND VISION COMPUTING, 2002, 20 (02) : 103 - 123
[25] STRUCTURE-FROM-MOTION UNDER ORTHOGRAPHIC PROJECTION
HARRIS, C
LECTURE NOTES IN COMPUTER SCIENCE, 1990, 427 : 118 - 123
[26] STRUCTURE-FROM-MOTION BY TRACKING OCCLUSION BOUNDARIES
THOMPSON, WB
BIOLOGICAL CYBERNETICS, 1989, 62 (02) : 113 - 116
[27] Structure-From-Motion and RGBD Depth Fusion
Chandrashekar, Akash
Papadakis, John
Willis, Andrew
Gantert, Jamie
IEEE SOUTHEASTCON 2018, 2018,
[28] ROBUST AND ACCURATE HYBRID STRUCTURE-FROM-MOTION
Li, Rui
Gong, Dong
Sun, Jinqiu
Zhu, Yu
Wei, Ziwei
Zhang, Yanning
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 494 - 498
[29] Amblyopic deficits in processing structure-from-motion
Husk, Jesse S.
Farivar, Reza
Hess, Robert F.
JOURNAL OF VISION, 2012, 12 (04):
[30] Structure-From-Motion With Varying Principal Point
Smith, W. A. P.
Lewinska, P.
Cooper, M. A.
Hancock, E. R.
Dowdeswell, J. A.
Rippin, D. M.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19

← 1 2 3 4 5 →