Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

被引：18

作者：

Li, Ruihao ^{[1
]}

Gu, Dongbing ^{[1
]}

Liu, Qiang ^{[1
]}

Long, Zhiqiang ^{[2
]}

Hu, Huosheng ^{[1
]}

机构：

[1] Univ Essex, Dept Comp Sci & Elect Engn, Colchester CO4 3SQ, Essex, England

[2] Natl Univ Def Technol, Coll Mechatron & Automat, Changsha, Hunan, Peoples R China

来源：

COGNITIVE COMPUTATION | 2018年 / 10卷 / 02期

关键词：

Deep learning; Spatio-temporal neural network; 3D semantic map; Robotics; SIMULTANEOUS LOCALIZATION; SLAM;

D O I：

10.1007/s12559-017-9526-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic scene mapping is a challenge and significant task for robotic application, such as autonomous navigation and robot-environment interaction. In this paper, we propose a semantic pixel-wise mapping system for potential robotic applications. The system includes a novel spatio-temporal deep neural network for semantic segmentation and a Simultaneous Localisation and Mapping (SLAM) algorithm for 3D point cloud map. Their combination yields a 3D semantic pixel-wise map. The proposed network consists of Convolutional Neural Networks (CNNs) with two streams: spatial stream with images as the input and temporal stream with image differences as the input. Due to the use of both spatial and temporal information, it is called spatio-temporal deep neural network, which shows a better performance in both accuracy and robustness in semantic segmentation. Further, only keyframes are selected for semantic segmentation in order to reduce the computational burden for video streams and improve the real-time performance. Based on the result of semantic segmentation, a 3D semantic map is built up by using the 3D point cloud map from a SLAM algorithm. The proposed spatio-temporal neural network is evaluated on both Cityscapes benchmark (a public dataset) and Essex Indoor benchmark (a dataset we labelled ourselves manually). Compared with the state-of-the-art spatial only neural networks, the proposed network achieves better performances in both pixel-wise accuracy and Intersection over Union (IoU) for scene segmentation. The constructed 3D semantic map with our methods is accurate and meaningful for robotic applications.

引用

页码：260 / 271

页数：12

共 39 条

[1] [Anonymous], 2016, ROBOTICS SCI SYSTEMS
[2] [Anonymous], 2016, CoRR
[3] [Anonymous], IEEE T AUTOM SCI ENG
[4] [Anonymous], 2015, ARXIV151100561
[5] [Anonymous], 2014, Computer Science
[6] [Anonymous], PROC CVPR IEEE
[7] [Anonymous], 2016, ARXIV160600915
[8] [Anonymous], 2015, COMPUTER SCI
[9] [Anonymous], 2015, PROC CVPR IEEE
[10] [Anonymous], COGN COMPUT

← 1 2 3 4 →