Cross-View Semantic Segmentation for Sensing Surroundings

被引:192
作者
Pan, Bowen [1 ]
Sun, Jiankai [2 ]
Leung, Ho Yin Tiga [2 ]
Andonian, Alex [1 ]
Zhou, Bolei [2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China
关键词
Semantic scene understanding; deep learning for visual perception; visual learning; visual-based navigation; computer vision for other robotic applications;
D O I
10.1109/LRA.2020.3004325
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.
引用
收藏
页码:4867 / 4873
页数:7
相关论文
共 20 条
[1]  
Caesar H., 2019, PROC CVPR IEEE, DOI DOI 10.1109/CVPR42600.2020.01164
[2]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[3]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[4]  
Dosovitskiy A., 2017, P 1 ANN C ROB LEARN, P1, DOI DOI 10.48550/ARXIV.1711.03938
[5]  
Hedau V, 2012, PROC CVPR IEEE, P2807
[6]   Semantic Mapping Based on Spatial Concepts for Grounding Words Related to Places in Daily Environments [J].
Katsumata, Yuki ;
Taniguchi, Akira ;
Hagiwara, Yoshinobu ;
Taniguchi, Tadahiro .
FRONTIERS IN ROBOTICS AND AI, 2019, 6
[7]   Semantic mapping for mobile robotics tasks: A survey [J].
Kostavelis, Ioannis ;
Gasteratos, Antonios .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 66 :86-103
[8]  
Murali A., 2019, ARXIV190608236
[9]   Cross-View Image Synthesis using Conditional GANs [J].
Regmi, Krishna ;
Borji, Ali .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3501-3510
[10]   Learning to Look around Objects for Top-View Representations of Outdoor Scenes [J].
Schulter, Samuel ;
Zhai, Menghua ;
Jacobs, Nathan ;
Chandraker, Manmohan .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :815-831