YES-SLAM: YOLOv7-enhanced-semantic visual SLAM for mobile robots in dynamic scenes

被引：9

作者：

Liu, Hang ^{[1
]}

Luo, Jingwen ^{[1
,2
]}

机构：

[1] Yunnan Normal Univ, Sch Informat Sci & Technol, 768 Juxian St, Kunming 650500, Yunnan, Peoples R China

[2] Engn Res Ctr Comp Vis & Intelligent Control Techno, Dept Educ Yunnan Prov, Kunming, Yunnan, Peoples R China

来源：

MEASUREMENT SCIENCE AND TECHNOLOGY | 2024年 / 35卷 / 03期

关键词：

dynamic scenes; simultaneous localization and mapping (SLAM); YOLOv7; depth camera; loop closure detection; 3D semantic map;

D O I：

10.1088/1361-6501/ad14e7

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In dynamic scenes, moving objects will cause a significant error accumulation in robot's pose estimation, and might even lead to tracking loss. In view of these problems, this paper proposes a semantic visual simultaneous localization and mapping algorithm based on YOLOv7. First, a light-weight network YOLOv7 is employed to acquire the semantic information of different objects in the scene, and flood filling and edge-enhanced techniques are combined to accurately and quickly separate the dynamic feature points from the extracted feature point set. In this way, the obtained static feature points with high-confidence are used to achieve the accurate estimation of robot's pose. Then, according to the semantic information of YOLOv7, the motion magnitude of the robot, and the number of dynamic feature points in camera's field-of-view, a high-performance keyframe selection strategy is constructed. On this basis, a robust loop closure detection method is developed by introducing the semantic information into the bag-of-words model, and global bundle adjustment optimization is performed on all keyframes and map points to obtain a global consistent pose graph. Finally, YOLOv7 is further utilized to carry out semantic segmentation on the keyframes, remove the dynamic objects in its semantic mask, and combine the point cloud pre-processing and octree map to build a 3D navigation semantic map. A series of simulations on TUM dataset and a case study in real scene clearly demonstrated the performance superiority of the proposed algorithms.

引用

页数：19

共 36 条

[1] DDL-SLAM: A Robust RGB-D SLAM in Dynamic Environments Combined With Deep Learning [J].

Ai, Yongbao ;

Rui, Ting ;

Lu, Ming ;

Fu, Lei ;

Liu, Shuai ;

Wang, Song .

IEEE ACCESS, 2020, 8 :162335-162342

[2] LEAST-SQUARES FITTING OF 2 3-D POINT SETS [J].

ARUN, KS ;

HUANG, TS ;

BLOSTEIN, SD .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1987, 9 (05) :699-700

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4] DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes [J].

Bescos, Berta ;

Facil, Jose M. ;

Civera, Javier ;

Neira, Jose .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04) :4076-4083

[5] PCANet: A Simple Deep Learning Baseline for Image Classification? [J].

Chan, Tsung-Han ;

Jia, Kui ;

Gao, Shenghua ;

Lu, Jiwen ;

Zeng, Zinan ;

Ma, Yi .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5017-5032

[6] RANDOM SAMPLE CONSENSUS - A PARADIGM FOR MODEL-FITTING WITH APPLICATIONS TO IMAGE-ANALYSIS AND AUTOMATED CARTOGRAPHY [J].

FISCHLER, MA ;

BOLLES, RC .

COMMUNICATIONS OF THE ACM, 1981, 24 (06) :381-395

[7]

Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, 10.48550/arXiv.1704.04861]

[8]

Guo J., 2021, Photon. Laser, V32, P628, DOI [10.16136/j.joel.2021.06.0392, DOI 10.16136/J.JOEL.2021.06.0392]

[9]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[10] OctoMap: an efficient probabilistic 3D mapping framework based on octrees [J].

Hornung, Armin ;

Wurm, Kai M. ;

Bennewitz, Maren ;

Stachniss, Cyrill ;

Burgard, Wolfram .

AUTONOMOUS ROBOTS, 2013, 34 (03) :189-206

← 1 2 3 4 →