Learning to Look around Objects for Top-View Representations of Outdoor Scenes

被引:53
作者
Schulter, Samuel [1 ]
Zhai, Menghua [2 ]
Jacobs, Nathan [2 ]
Chandraker, Manmohan [1 ,3 ]
机构
[1] NEC Labs, Cupertino, CA 95014 USA
[2] Univ Kentucky, Lexington, KY 40506 USA
[3] Univ Calif San Diego, La Jolla, CA 92093 USA
来源
COMPUTER VISION - ECCV 2018, PT 15 | 2018年 / 11219卷
基金
美国国家科学基金会;
关键词
3D scene understanding; Occlusion reasoning; Semantic top-view representations;
D O I
10.1007/978-3-030-01267-0_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the visible scene, but also of occluded areas. We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our model does not require costly or subjective human annotations for occluded areas or the top-view, but rather uses readily available annotations for standard semantic segmentation in the perspective view. We extensively evaluate and analyze our approach on the KITTI and Cityscapes data sets.
引用
收藏
页码:815 / 831
页数:17
相关论文
共 38 条
[1]  
[Anonymous], 2017, PENSTREETMAP CONTRIB
[2]  
[Anonymous], 2016, ARXIV161108583
[3]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[4]   3D Semantic Parsing of Large-Scale Indoor Spaces [J].
Armeni, Iro ;
Sener, Ozan ;
Zamir, Amir R. ;
Jiang, Helen ;
Brilakis, Ioannis ;
Fischer, Martin ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543
[5]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
Dhiman Vikas., 2016, CVPR
[8]   Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].
Garg, Ravi ;
VijayKumar, B. G. ;
Carneiro, Gustavo ;
Reid, Ian .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756
[9]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[10]   3D Traffic Scene Understanding from Movable Platforms [J].
Geiger, Andreas ;
Lauer, Martin ;
Wojek, Christian ;
Stiller, Christoph ;
Urtasun, Raquel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) :1012-1025