Cross-View Semantic Segmentation for Sensing Surroundings

被引：192

作者：

Pan, Bowen ^{[1
]}

Sun, Jiankai ^{[2
]}

Leung, Ho Yin Tiga ^{[2
]}

Andonian, Alex ^{[1
]}

Zhou, Bolei ^{[2
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2020年 / 5卷 / 03期

关键词：

Semantic scene understanding; deep learning for visual perception; visual learning; visual-based navigation; computer vision for other robotic applications;

D O I：

10.1109/LRA.2020.3004325

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In the cross-view semantic segmentation task, the agent is trained to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The main issue of this task is that we lack the real-world annotations of top-down-view data. To mitigate this, we train the VPN in 3D graphics environment and utilize the domain adaptation technique to transfer it to handle real-world data. We evaluate our VPN on both synthetic and real-world agents. The experimental results show that our model can effectively make use of the information from different views and multi-modalities to understanding spatial information. Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input. Code and demo videos can be found at https://view-parsing-network.github.io.

引用

页码：4867 / 4873

页数：7

共 20 条

[1]

Caesar H., 2019, PROC CVPR IEEE, DOI DOI 10.1109/CVPR42600.2020.01164

[2] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[3] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[4]

Dosovitskiy A., 2017, P 1 ANN C ROB LEARN, P1, DOI DOI 10.48550/ARXIV.1711.03938

[5]

Hedau V, 2012, PROC CVPR IEEE, P2807

[6] Semantic Mapping Based on Spatial Concepts for Grounding Words Related to Places in Daily Environments [J].

Katsumata, Yuki ;

Taniguchi, Akira ;

Hagiwara, Yoshinobu ;

Taniguchi, Tadahiro .

FRONTIERS IN ROBOTICS AND AI, 2019, 6

[7] Semantic mapping for mobile robotics tasks: A survey [J].

Kostavelis, Ioannis ;

Gasteratos, Antonios .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 66 :86-103

[8]

Murali A., 2019, ARXIV190608236

[9] Cross-View Image Synthesis using Conditional GANs [J].

Regmi, Krishna ;

Borji, Ali .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3501-3510

[10] Learning to Look around Objects for Top-View Representations of Outdoor Scenes [J].

Schulter, Samuel ;

Zhai, Menghua ;

Jacobs, Nathan ;

Chandraker, Manmohan .

COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :815-831

← 1 2 →