Cross-View Semantic Segmentation for Sensing Surroundings

被引:156
作者
Pan, Bowen [1 ]
Sun, Jiankai [2 ]
Leung, Ho Yin Tiga [2 ]
Andonian, Alex [1 ]
Zhou, Bolei [2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China
关键词
Semantic scene understanding; deep learning for visual perception; visual learning; visual-based navigation; computer vision for other robotic applications;
D O I
10.1109/LRA.2020.3004325
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.
引用
收藏
页码:4867 / 4873
页数:7
相关论文
共 20 条
  • [1] Caesar H., 2019, ABS190311027 CORR, P11621
  • [2] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [3] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
    Dai, Angela
    Chang, Angel X.
    Savva, Manolis
    Halber, Maciej
    Funkhouser, Thomas
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443
  • [4] Dosovitskiy A., 2017, C ROB LEARN, P1, DOI DOI 10.48550/ARXIV.1711.03938
  • [5] Hedau V, 2012, PROC CVPR IEEE, P2807
  • [6] Semantic Mapping Based on Spatial Concepts for Grounding Words Related to Places in Daily Environments
    Katsumata, Yuki
    Taniguchi, Akira
    Hagiwara, Yoshinobu
    Taniguchi, Tadahiro
    [J]. FRONTIERS IN ROBOTICS AND AI, 2019, 6
  • [7] Semantic mapping for mobile robotics tasks: A survey
    Kostavelis, Ioannis
    Gasteratos, Antonios
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 66 : 86 - 103
  • [8] Murali A., 2019, ARXIV190608236
  • [9] Cross-View Image Synthesis using Conditional GANs
    Regmi, Krishna
    Borji, Ali
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3501 - 3510
  • [10] Learning to Look around Objects for Top-View Representations of Outdoor Scenes
    Schulter, Samuel
    Zhai, Menghua
    Jacobs, Nathan
    Chandraker, Manmohan
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 815 - 831