A Light Touch Approach to Teaching Transformers Multi-view Geometry

被引：3

作者：

Bhalgat, Yash ^{[1
]}

Henriques, Joao F. ^{[1
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Visual Geometry Grp, Oxford, England

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1109/CVPR52729.2023.00480

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps during training, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.

引用

页码：4958 / 4969

页数：12

共 92 条

[1] [Anonymous], 2021, INT C MACH LEARN
[2] [Anonymous], 2008, P 17 INT C WORLD WID, DOI DOI 10.1145/1367497.1367540
[3] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[4] Arandjelovic R, 2012, PROC CVPR IEEE, P2911, DOI 10.1109/CVPR.2012.6248018
[5] Neural Codes for Image Retrieval
Babenko, Artem
Slesarev, Anton
Chigorin, Alexandr
Lempitsky, Victor
[J]. COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 : 584 - 599
[6] MAGSAC plus plus , a fast, reliable and accurate robust estimator
Barath, Daniel
Noskova, Jana
Ivashechkin, Maksym
Matas, Jiri
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1301 - 1309
[7] Bertasius G, 2021, PR MACH LEARN RES, V139
[8] Bingyi Cao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12365), P726, DOI 10.1007/978-3-030-58565-5_43
[9] Boudiaf Malik, 2020, P 16 EUR VIS COMP VI, P548
[10] DSAC - Differentiable RANSAC for Camera Localization
Brachmann, Eric
Krull, Alexander
Nowozin, Sebastian
Shotton, Jamie
Michel, Frank
Gumhold, Stefan
Rother, Carsten
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2492 - 2500

← 1 2 3 4 5 6 7 8 9 10 →