Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images

被引:17
作者
Elhashash, Mostafa [1 ,3 ]
Qin, Rongjun [1 ,2 ,3 ,4 ]
机构
[1] Ohio State Univ, Geospatial Data Analyt Lab, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Civil Environm & Geodet Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
[4] Ohio State Univ, Translat Data Analyt Inst, Columbus, OH 43210 USA
关键词
SLAM; Pose estimation; 3D modeling; Cross-view data fusion; Localization; Satellite 3D model; DIGITAL SURFACE MODEL; STEREO; LOCALIZATION; GENERATION; TIME;
D O I
10.1016/j.isprsjprs.2022.03.018
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Accurate pose estimation of monocular ground-level images with respect to a satellite/aerial photogrammetric dataset is an extremely challenging task. Existing solutions often perform an offline post-registration on 3D results from both sources, which however, suffers from non-rigid geometric distortions of the 3D monocular reconstruction and the lack of overlaps between the air to ground content. This paper provides an online solution that performs accurate pose estimation of the ground images with respect to a 3D model derived from satellite images, followed by a dense 3D reconstruction. Our solution takes advantage of the simultaneous localization and mapping (SLAM) paradigm to dynamically incorporate reference observations from the satellite 3D model during the incremental pose estimation, called a cross-view SLAM solver, which leverages both ground-to satellite error and image-level reprojection errors at the frame level to yield image poses that are well registered to the satellite 3D model for facade point cloud reconstruction. This process also has the advantage of correcting non-rigid distortions and trajectory drifts that are often presented in monocular SLAM systems. In addition, our solution leverages both the geometric and semantic information from the satellite model and ground images to perform a per-frame correction for frame-level pose initialization, in which a novel scheme called pose buffer is introduced to initialize the pose of each keyframe through robust visual hull alignment of ground objects. The proposed approach has been experimented using four trajectories of monocular videos collections (around 7,000 frames per trajectory on average) and a 3D semantic model from multi-view satellite images to estimate the poses of the video frames and yield point clouds consistent with the satellite 3D models, evaluated by using LiDAR ground-truth. Both qualitative and quantitative experiments demonstrate that our solution yields accurate, drift-free poses and point clouds consistent with the satellite data and visually much more pleasing 3D models with facade information. Compared to the LiDAR ground-truth, the derived 3D models with ground-level images have achieved a mean absolute error of 1.78 m (improved from 3.15 m achieved using SLAM without utilizing satellite 3D models) (A testing program will be made available through https://github. com/GDAOSU/Cross-View-SLAM).
引用
收藏
页码:62 / 74
页数:13
相关论文
共 70 条
[1]  
Agarwal S., 2020, Ceres solver
[2]  
[Anonymous], 2017, ASPRS C IGTF 2017
[3]  
[Anonymous], 2011, ROBUST STAT APPROACH
[4]   PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing [J].
Barnes, Connelly ;
Shechtman, Eli ;
Finkelstein, Adam ;
Goldman, Dan B. .
ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03)
[5]   Improved Road Connectivity by Joint Learning of Orientation and Segmentation [J].
Batra, Anil ;
Singh, Suriya ;
Pang, Guan ;
Basu, Saikat ;
Jawahar, C., V ;
Paluri, Manohar .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10377-10385
[6]   SURF: Speeded up robust features [J].
Bay, Herbert ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417
[7]  
Bosch Marc, 2016, 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), DOI 10.1109/AIPR.2016.8010543
[8]   ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM [J].
Campos, Carlos ;
Elvira, Richard ;
Gomez Rodriguez, Juan J. ;
Montiel, Jose M. M. ;
Tardos, Juan D. .
IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) :1874-1890
[9]   Real-time High-accuracy Three-Dimensional Reconstruction with Consumer RGB-D Cameras [J].
Cao, Yan-Pei ;
Kobbelt, Leif ;
Hu, Shi-Min .
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (05)
[10]  
Cernea D., 2020, OpenMVS: Multi-View Stereo Reconstruction Library