Populating 3D Scenes by Learning Human-Scene Interaction

被引:67
作者
Hassan, Mohamed [1 ]
Ghosh, Partha [1 ]
Tesch, Joachim [1 ]
Tzionas, Dimitrios [1 ]
Black, Michael J. [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Tubingen, Germany
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
MODEL; CAPTURE;
D O I
10.1109/CVPR46437.2021.01447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, called POSA for "Pose with prOximitieS and contActs". The representation of interaction is body-centric, which enables it to generalize to new scenes. Specifically, POSA augments the SMPL-X parametric human body model such that, for every mesh vertex, it encodes (a) the contact probability with the scene surface and (b) the corresponding semantic scene label. We learn POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and the corresponding scene semantics from the PROX-E dataset. We demonstrate the value of POSA with two applications. First, we automatically place 3D scans of people in scenes. We use a SMPL-X model fit to the scan as a proxy and then find its most likely placement in 3D. POSA provides an effective representation to search for "affordances" in the scene that match the likely contact relationships for that pose. We perform a perceptual study that shows significant improvement over the state of the art on this task. Second, we show that POSA's learned representation of body-scene interaction supports monocular human pose estimation that is consistent with a 3D scene, improving on the state of the art. Our model and code are available for research purposes at https://posa.is.tue.mpg.de.
引用
收藏
页码:14703 / 14713
页数:11
相关论文
共 68 条
[1]  
Ali Al-Asqhar Rami, 2013, P 12 ACM SIGGRAPHEUR, P45, DOI [10.1145/2485895.2485905, DOI 10.1145/2485895.2485905]
[2]   SCAPE: Shape Completion and Animation of People [J].
Anguelov, D ;
Srinivasan, P ;
Koller, D ;
Thrun, S ;
Rodgers, J ;
Davis, J .
ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03) :408-416
[3]  
[Anonymous], 1998, P 25 ANN C COMPUTER, DOI 10.1145/280814.280820
[4]  
[Anonymous], 2006, Proceedings of the British Machine Vision Conference, BMVC 2006
[5]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.415
[6]  
Armeni I., 2017, ARXIV170201105, DOI DOI 10.48550/ARXIV.1702.01105
[7]   Dynamic FAUST: Registering Human Bodies in Motion [J].
Bogo, Federica ;
Romero, Javier ;
Pons-Moll, Gerard ;
Black, Michael J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5573-5582
[8]   Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation [J].
Bouritsas, Giorgos ;
Bokhnyak, Sergiy ;
Ploumpis, Stylianos ;
Bronstein, Michael ;
Zafeiriou, Stefanos .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7212-7221
[9]  
Brahmbhatt Samarth, 2020, P EUR C COMP VIS ECC, P361
[10]   Benchmarking in Manipulation Research Using the Yale-CMU-Berkeley Object and Model Set [J].
Calli, Berk ;
Walsman, Aaron ;
Singh, Arjun ;
Srinivasa, Siddhartha ;
Abbeel, Pieter ;
Dollar, Aaron M. .
IEEE ROBOTICS & AUTOMATION MAGAZINE, 2015, 22 (03) :36-52