Alignment of Deep Features in 3D Models for Camera Pose Estimation

被引:0
作者
Su, Jui-Yuan [1 ,2 ]
Cheng, Shyi-Chyi [2 ]
Chang, Chin-Chun [2 ]
Hsieh, Jun-Wei [2 ]
机构
[1] Ming Chuan Univ, Dept New Media & Commun Adm, Taipei, Taiwan
[2] Natl Taiwan Ocean Univ, Dept Comp Sci & Informat Engn, Keelung, Taiwan
来源
MULTIMEDIA MODELING, MMM 2019, PT II | 2019年 / 11296卷
关键词
Unsupervised fragment classification; 3D model; Deep learning; Camera pose estimation; 3D point cloud clustering;
D O I
10.1007/978-3-030-05716-9_36
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering-with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.
引用
收藏
页码:440 / 452
页数:13
相关论文
共 26 条
[1]   Airborne laser scanning - present status and future expectations [J].
Ackermann, F .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 1999, 54 (2-3) :64-67
[2]  
[Anonymous], 2000, Elements of Photogrammetry with Applications in GIS
[3]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[4]   Model-Based 3D Scene Reconstruction Using a Moving RGB-D Camera [J].
Cheng, Shyi-Chyi ;
Su, Jui-Yuan ;
Chen, Jing-Min ;
Hsieh, Jun-Wei .
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 :214-225
[5]  
Choi S, 2015, PROC CVPR IEEE, P5556, DOI 10.1109/CVPR.2015.7299195
[6]  
Curless B., 1996, Computer Graphics Proceedings. SIGGRAPH '96, P303, DOI 10.1145/237170.237269
[7]   MonoSLAM: Real-time single camera SLAM [J].
Davison, Andrew J. ;
Reid, Ian D. ;
Molton, Nicholas D. ;
Stasse, Olivier .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (06) :1052-1067
[8]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[9]  
Endres F, 2012, IEEE INT CONF ROBOT, P1691, DOI 10.1109/ICRA.2012.6225199
[10]   Accurate, Dense, and Robust Multiview Stereopsis [J].
Furukawa, Yasutaka ;
Ponce, Jean .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (08) :1362-1376