Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

被引：18

作者：

Li, Ruihao ^{[1
]}

Gu, Dongbing ^{[1
]}

Liu, Qiang ^{[1
]}

Long, Zhiqiang ^{[2
]}

Hu, Huosheng ^{[1
]}

机构：

[1] Univ Essex, Dept Comp Sci & Elect Engn, Colchester CO4 3SQ, Essex, England

[2] Natl Univ Def Technol, Coll Mechatron & Automat, Changsha, Hunan, Peoples R China

来源：

COGNITIVE COMPUTATION | 2018年 / 10卷 / 02期

关键词：

Deep learning; Spatio-temporal neural network; 3D semantic map; Robotics; SIMULTANEOUS LOCALIZATION; SLAM;

D O I：

10.1007/s12559-017-9526-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic scene mapping is a challenge and significant task for robotic application, such as autonomous navigation and robot-environment interaction. In this paper, we propose a semantic pixel-wise mapping system for potential robotic applications. The system includes a novel spatio-temporal deep neural network for semantic segmentation and a Simultaneous Localisation and Mapping (SLAM) algorithm for 3D point cloud map. Their combination yields a 3D semantic pixel-wise map. The proposed network consists of Convolutional Neural Networks (CNNs) with two streams: spatial stream with images as the input and temporal stream with image differences as the input. Due to the use of both spatial and temporal information, it is called spatio-temporal deep neural network, which shows a better performance in both accuracy and robustness in semantic segmentation. Further, only keyframes are selected for semantic segmentation in order to reduce the computational burden for video streams and improve the real-time performance. Based on the result of semantic segmentation, a 3D semantic map is built up by using the 3D point cloud map from a SLAM algorithm. The proposed spatio-temporal neural network is evaluated on both Cityscapes benchmark (a public dataset) and Essex Indoor benchmark (a dataset we labelled ourselves manually). Compared with the state-of-the-art spatial only neural networks, the proposed network achieves better performances in both pixel-wise accuracy and Intersection over Union (IoU) for scene segmentation. The constructed 3D semantic map with our methods is accurate and meaningful for robotic applications.

引用

页码：260 / 271

页数：12

共 39 条

[11] [Anonymous], 2016, ARXIV160404339
[12] [Anonymous], 2016, ARXIV161201105CS
[13] [Anonymous], 2016, ARXIV160805442
[14] [Anonymous], 2016, P ECCV
[15] [Anonymous], 2017, 2017 IEEE INT C ROB
[16] [Anonymous], P OF ACCV
[17] Higher Order Conditional Random Fields in Deep Neural Networks
Arnab, Anurag
Jayasumana, Sadeep
Zheng, Shuai
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 524 - 540
[18] Simultaneous localization and mapping (SLAM): Part II
Bailey, Tim
Durrant-Whyte, Hugh
[J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2006, 13 (03) : 108 - 117
[19] Attention to Scale: Scale-aware Semantic Image Segmentation
Chen, Liang-Chieh
Yang, Yi
Wang, Jiang
Xu, Wei
Yuille, Alan L.
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3640 - 3649
[20] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223

← 1 2 3 4 →