YGC-SLAM:A visual SLAM based on improved YOLOv5 and geometric constraints for dynamic indoor environments

被引:2
作者
Zhang, Juncheng [1 ]
Ke, Fuyang [2 ,3 ]
Tang, Qinqin [4 ]
Yu, Wenming [1 ]
Zhang, Ming [1 ]
机构
[1] Univ Informat Sci & Technol, Dept Automat, Nanjing 210044, Peoples R China
[2] Univ Informat Sci & Technol, Dept Software, Nanjing 210044, Peoples R China
[3] Nanjing Univ Informat Engn, Wuxi Res Inst, Wuxi 214101, Peoples R China
[4] Univ Wuxi, Dept Rail Transport, Wuxi 214101, Peoples R China
关键词
Visual SLAM; Dynamic SLAM; Target detection; Geometric constraints;
D O I
10.1016/j.vrih.2024.05.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Background As visual simultaneous localization and mapping (SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system. Methods First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map. Results Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORB-SLAM2, which is more accurate and has better real-time performance than similar algorithms, such as DynaSLAM and DS-SLAM. Conclusions The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.
引用
收藏
页码:62 / 82
页数:21
相关论文
共 33 条
[1]  
Albawi S, 2017, I C ENG TECHNOL
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   Dense 3D SLAM in Dynamic Scenes Using Kinect [J].
Bakkay, Mohamed Chafik ;
Arafa, Majdi ;
Zagrouba, Ezzeddine .
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 :121-129
[4]   DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes [J].
Bescos, Berta ;
Facil, Jose M. ;
Civera, Javier ;
Neira, Jose .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :4076-4083
[5]   ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM [J].
Campos, Carlos ;
Elvira, Richard ;
Gomez Rodriguez, Juan J. ;
Montiel, Jose M. M. ;
Tardos, Juan D. .
IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) :1874-1890
[6]   Robust Visual Localization in Dynamic Environments Based on Sparse Motion Removal [J].
Cheng, Jiyu ;
Wang, Chaoqun ;
Meng, Max Q-H .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2020, 17 (02) :658-669
[7]   MonoSLAM: Real-time single camera SLAM [J].
Davison, Andrew J. ;
Reid, Ian D. ;
Molton, Nicholas D. ;
Stasse, Olivier .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (06) :1052-1067
[8]   LSD-SLAM: Large-Scale Direct Monocular SLAM [J].
Engel, Jakob ;
Schoeps, Thomas ;
Cremers, Daniel .
COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 :834-849
[9]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[10]  
Hartley R., 2003, Multiple view geometry in computer vision