Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

被引:49
作者
Li, Xueting [1 ]
Liu, Sifei [2 ]
Kim, Kihwan [2 ]
Wang, Xiaolong [3 ]
Yang, Ming-Hsuan [1 ,4 ]
Kautz, Jan [2 ]
机构
[1] Univ Calif Merced, Merced, CA 95343 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[4] Google Cloud, Mountain View, CA USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.01265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Affordance(1) modeling plays an important role in visual understanding. In this paper, we aim to predict affordances of 3D indoor scenes, specifically what human poses are afforded by a given indoor environment, such as sitting on a chair or standing on the floor. In order to predict valid affordances and learn possible 3D human poses in indoor scenes, we need to understand the semantic and geometric structure of a scene as well as its potential interactions with a human. To learn such a model, a large-scale dataset of 3D indoor affordances is required. In this work, we build a fully automatic 3D pose synthesizer that fuses semantic knowledge from a large number of 2D poses extracted from TV shows as well as 3D geometric knowledge from voxel representations of indoor scenes. With the data created by the synthesizer, we introduce a 3D pose generative model to predict semantically plausible and physically feasible human poses within a given scene (provided as a single RGB, RGB-D, or depth image). We demonstrate that our human affordance prediction method consistently outperforms existing state-of-the-art methods.
引用
收藏
页码:12360 / 12368
页数:9
相关论文
共 35 条
[1]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[2]   Learning to Act Properly: Predicting and Explaining Affordances from Images [J].
Chuang, Ching-Yao ;
Li, Jiaman ;
Torralba, Antonio ;
Fidler, Sanja .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :975-983
[3]  
Das A, 2018, INT SYMP NETW CHIP
[4]   People Watching: Human Actions as a Cue for Single View Geometry [J].
Fouhey, David F. ;
Delaitre, Vincent ;
Gupta, Abhinav ;
Efros, Alexei A. ;
Laptev, Ivan ;
Sivic, Josef .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 110 (03) :259-274
[5]  
Fouhey David F., 2015, DEFENSE DIRECT PERCE, V1, P2
[6]  
Gibson J. J., 1979, The ecological approach to visual perception
[7]  
Grabner H, 2011, PROC CVPR IEEE, P1529, DOI 10.1109/CVPR.2011.5995327
[8]   Cell Therapy for Critical Limb Ischemia Moving Forward One Step at a Time [J].
Gupta, Rajesh ;
Losordo, Douglas W. .
CIRCULATION-CARDIOVASCULAR INTERVENTIONS, 2011, 4 (01) :2-5
[9]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[10]   Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [J].
Huang, Xun ;
Belongie, Serge .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1510-1519