Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

被引:18
作者
Li, Ruihao [1 ]
Gu, Dongbing [1 ]
Liu, Qiang [1 ]
Long, Zhiqiang [2 ]
Hu, Huosheng [1 ]
机构
[1] Univ Essex, Dept Comp Sci & Elect Engn, Colchester CO4 3SQ, Essex, England
[2] Natl Univ Def Technol, Coll Mechatron & Automat, Changsha, Hunan, Peoples R China
关键词
Deep learning; Spatio-temporal neural network; 3D semantic map; Robotics; SIMULTANEOUS LOCALIZATION; SLAM;
D O I
10.1007/s12559-017-9526-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic scene mapping is a challenge and significant task for robotic application, such as autonomous navigation and robot-environment interaction. In this paper, we propose a semantic pixel-wise mapping system for potential robotic applications. The system includes a novel spatio-temporal deep neural network for semantic segmentation and a Simultaneous Localisation and Mapping (SLAM) algorithm for 3D point cloud map. Their combination yields a 3D semantic pixel-wise map. The proposed network consists of Convolutional Neural Networks (CNNs) with two streams: spatial stream with images as the input and temporal stream with image differences as the input. Due to the use of both spatial and temporal information, it is called spatio-temporal deep neural network, which shows a better performance in both accuracy and robustness in semantic segmentation. Further, only keyframes are selected for semantic segmentation in order to reduce the computational burden for video streams and improve the real-time performance. Based on the result of semantic segmentation, a 3D semantic map is built up by using the 3D point cloud map from a SLAM algorithm. The proposed spatio-temporal neural network is evaluated on both Cityscapes benchmark (a public dataset) and Essex Indoor benchmark (a dataset we labelled ourselves manually). Compared with the state-of-the-art spatial only neural networks, the proposed network achieves better performances in both pixel-wise accuracy and Intersection over Union (IoU) for scene segmentation. The constructed 3D semantic map with our methods is accurate and meaningful for robotic applications.
引用
收藏
页码:260 / 271
页数:12
相关论文
共 39 条
  • [11] [Anonymous], 2016, ARXIV160404339
  • [12] [Anonymous], 2016, ARXIV161201105CS
  • [13] [Anonymous], 2016, ARXIV160805442
  • [14] [Anonymous], 2016, P ECCV
  • [15] [Anonymous], 2017, 2017 IEEE INT C ROB
  • [16] [Anonymous], P OF ACCV
  • [17] Higher Order Conditional Random Fields in Deep Neural Networks
    Arnab, Anurag
    Jayasumana, Sadeep
    Zheng, Shuai
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 524 - 540
  • [18] Simultaneous localization and mapping (SLAM): Part II
    Bailey, Tim
    Durrant-Whyte, Hugh
    [J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2006, 13 (03) : 108 - 117
  • [19] Attention to Scale: Scale-aware Semantic Image Segmentation
    Chen, Liang-Chieh
    Yang, Yi
    Wang, Jiang
    Xu, Wei
    Yuille, Alan L.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3640 - 3649
  • [20] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223