Progressive Temporal Transformer for Bird's-Eye-View Camera Pose Estimation

被引:0
作者
Wu, Zhuoyuan [1 ]
Cai, Jiancheng [1 ]
Huang, Ranran [1 ]
Liu, Xinmin [1 ]
Chai, Zhenhua [1 ]
机构
[1] Meituan, 7 Rongda Rd, Beijing 100012, Peoples R China
来源
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI | 2024年 / 14452卷
关键词
Camera Pose Estimation; Birds-Eye-View; Transformer;
D O I
10.1007/978-981-99-8076-5_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual relocalization is a crucial technique used in visual odometry and SLAM to predict the 6-DoF camera pose of a query image. Existing works mainly focus on ground view in indoor or outdoor scenes. However, camera relocalization on unmanned aerial vehicles is less focused. Also, frequent view changes and a large depth of view make it more challenging. In this work, we establish a Bird's-Eye-View (BEV) dataset for camera relocalization, a large dataset contains four distinct scenes (roof, farmland, bare ground, and urban area) with such challenging problems as frequent view changing, repetitive or weak textures and large depths of fields. All images in the dataset are associated with a ground-truth camera pose. The BEV dataset contains 177242 images, a challenging large-scale dataset for camera relocalization. We also propose a Progressive Temporal transFormer (dubbed as PTFormer) as the baseline model. PTFormer is a sequence-based transformer with a designed progressive temporal aggregation module for temporal correlation exploitation and a parallel absolute and relative prediction head for implicitly modeling the temporal constraint. Thorough experiments are exhibited on both the BEV dataset and widely used handheld datasets of 7Scenes and Cambridge Landmarks to prove the robustness of our proposed method.
引用
收藏
页码:133 / 147
页数:15
相关论文
共 40 条
[1]   RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets [J].
Balntas, Vassileios ;
Li, Shuda ;
Prisacariu, Victor .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :782-799
[2]   Learning Less is More-6D Camera Localization via 3D Surface Regression [J].
Brachmann, Eric ;
Rother, Carsten .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4654-4662
[3]   DSAC - Differentiable RANSAC for Camera Localization [J].
Brachmann, Eric ;
Krull, Alexander ;
Nowozin, Sebastian ;
Shotton, Jamie ;
Michel, Frank ;
Gumhold, Stefan ;
Rother, Carsten .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2492-2500
[4]   Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].
Brachmann, Eric ;
Michel, Frank ;
Krull, Alexander ;
Yang, Michael Ying ;
Gumhold, Stefan ;
Rother, Carsten .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372
[5]   Geometry-Aware Learning of Maps for Camera Localization [J].
Brahmbhatt, Samarth ;
Gu, Jinwei ;
Kim, Kihwan ;
Hays, James ;
Kautz, Jan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2616-2625
[6]   Minimal Scene Descriptions from Structure from Motion Models [J].
Cao, Song ;
Snavely, Noah .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :461-468
[7]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[8]   VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization [J].
Clark, Ronald ;
Wang, Sen ;
Markham, Andrew ;
Trigoni, Niki ;
Wen, Hongkai .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2652-2660
[9]   SuperPoint: Self-Supervised Interest Point Detection and Description [J].
DeTone, Daniel ;
Malisiewicz, Tomasz ;
Rabinovich, Andrew .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :337-349
[10]   D2-Net: A Trainable CNN for Joint Description and Detection of Local Features [J].
Dusmanu, Mihai ;
Rocco, Ignacio ;
Pajdla, Tomas ;
Pollefeys, Marc ;
Sivic, Josef ;
Torii, Akihiko ;
Sattler, Torsten .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8084-8093