A Light Touch Approach to Teaching Transformers Multi-view Geometry

被引:3
作者
Bhalgat, Yash [1 ]
Henriques, Joao F. [1 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Oxford, England
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.00480
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps during training, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time.
引用
收藏
页码:4958 / 4969
页数:12
相关论文
共 92 条
  • [1] [Anonymous], 2021, INT C MACH LEARN
  • [2] [Anonymous], 2008, P 17 INT C WORLD WID, DOI DOI 10.1145/1367497.1367540
  • [3] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
  • [4] Arandjelovic R, 2012, PROC CVPR IEEE, P2911, DOI 10.1109/CVPR.2012.6248018
  • [5] Neural Codes for Image Retrieval
    Babenko, Artem
    Slesarev, Anton
    Chigorin, Alexandr
    Lempitsky, Victor
    [J]. COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 : 584 - 599
  • [6] MAGSAC plus plus , a fast, reliable and accurate robust estimator
    Barath, Daniel
    Noskova, Jana
    Ivashechkin, Maksym
    Matas, Jiri
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1301 - 1309
  • [7] Bertasius G, 2021, PR MACH LEARN RES, V139
  • [8] Bingyi Cao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12365), P726, DOI 10.1007/978-3-030-58565-5_43
  • [9] Boudiaf Malik, 2020, P 16 EUR VIS COMP VI, P548
  • [10] DSAC - Differentiable RANSAC for Camera Localization
    Brachmann, Eric
    Krull, Alexander
    Nowozin, Sebastian
    Shotton, Jamie
    Michel, Frank
    Gumhold, Stefan
    Rother, Carsten
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2492 - 2500