Indoor 3D Semantic Robot VSLAM Based on Mask Regional Convolutional Neural Network

被引：18

作者：

Tao, Chongben ^{[1
]}

Gao, Zhen ^{[2
]}

Yan, Jinli ^{[1
]}

Li, Chunguang ^{[3
]}

Cui, Guozeng ^{[1
]}

机构：

[1] Suzhou Univ Sci & Technol, Suzhou Smart City Res Inst, Suzhou 215009, Peoples R China

[2] McMaster Univ, Fac Engn, Hamilton, ON L8S 0A, Canada

[3] Changzhou Inst Technol, Sch Comp Informat & Engn, Changzhou 213002, Jiangsu, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

VSLAM; deep learning; target detection; instance segmentation; semantic map;

D O I：

10.1109/ACCESS.2020.2981648

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

During the construction of indoor environmental semantic maps by robot Vision SLAM (VSLAM), there exist some problems such as low label classification accuracy and low precision under the situation of sparse feature points. In this case, this paper proposes an indoor three-dimensional semantic VSLAM algorithm based on Mask Regional Convolutional Neural Network (RCNN). Firstly, an Oriented FAST and a Rotated BRIEF (ORB) algorithms are used to extract image feature points. Secondly, a Random Sample Consensus (RANSAC) algorithm is employed to eliminate mismatched points and estimate camera position-pose changes. Then, a Mask RCNN algorithm is applied to make partial adjustments to its hyper parameter. A self-made data set is used to transfer learning, fulfilling real-time target detection and instance segmentation of a scene. A three-dimensional semantic map is constructed in combination with VSLAM algorithm. The semantic information in the environment not only improves the accuracy of VSLAM construction and positioning, but also reduces the impact of object movement on the construction by marking movable objects. Meanwhile, the VSLAM algorithm is used to calculate the positional constraints between objects and improve the accuracy of semantic understanding. Finally, by comparing with other methods, it demonstrates that this method is more correct and effective. It was also verified that the proposed method can accurately interpret the semantic information in environment for the construction of three-dimensional semantic maps.

引用

页码：52906 / 52916

页数：11

共 25 条

[1]

[Anonymous], 2014, INT C LEARN REPR ICL

[2]

[Anonymous], ROS OPEN SOURCE ROBO

[3]

[Anonymous], COMPUT ANIMATION VIR

[4]

[Anonymous], ROBOT SCI SYST

[5] BRIEF: Binary Robust Independent Elementary Features [J].

Calonder, Michael ;

Lepetit, Vincent ;

Strecha, Christoph ;

Fua, Pascal .

COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :778-792

[6] Instance-aware Semantic Segmentation via Multi-task Network Cascades [J].

Dai, Jifeng ;

He, Kaiming ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3150-3158

[7] LSD-SLAM: Large-Scale Direct Monocular SLAM [J].

Engel, Jakob ;

Schoeps, Thomas ;

Cremers, Daniel .

COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 :834-849

[8] Dynamic objects elimination in SLAM based on image fusion [J].

Fan, Yingchun ;

Han, Hong ;

Tang, Yuliang ;

Zhi, Tao .

PATTERN RECOGNITION LETTERS, 2019, 127 :191-201

[9]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]

[10]

Kang R., 2019, ARXIV190107223

← 1 2 3 →