Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos

被引:2
作者
Li, Zongmian [1 ,2 ]
Sedlar, Jiri [3 ]
Carpentier, Justin [1 ,2 ]
Laptev, Ivan [1 ,2 ]
Mansard, Nicolas [4 ,5 ]
Sivic, Josef [3 ]
机构
[1] PSL Res Univ, Dept Informat ENS, CNRS, Ecole Normale Super, Paris, France
[2] Inria Paris, Willow Project, Paris, France
[3] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague, Czech Republic
[4] Univ Toulouse, CNRS, LAAS CNRS, Toulouse, France
[5] Artif & Nat Intelligence Toulouse Insitute ANITI, Toulouse, France
关键词
Single-view 3D pose estimation; Force estimation; Person-object interaction; Instructional video; Contact recognition; Motion capture; ALGORITHMS; PEOPLE; POSE;
D O I
10.1007/s11263-021-01540-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a method to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video. Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces exerted on the human body. The main contributions of this work are three-fold. First, we introduce an approach to jointly estimate the motion and the actuation forces of the person on the manipulated object by modeling contacts and the dynamics of the interactions. This is cast as a large-scale trajectory optimization problem. Second, we develop a method to automatically recognize from the input video the 2D position and timing of contacts between the person and the object or the ground, thereby significantly simplifying the complexity of the optimization. Third, we validate our approach on a recent video + MoCap dataset capturing typical parkour actions, and demonstrate its performance on a new dataset of Internet videos showing people manipulating a variety of tools in unconstrained environments.
引用
收藏
页码:363 / 383
页数:21
相关论文
共 80 条
[1]  
Abdulla W., 2017, Mask R-CNN for object detection and instance segmentation on keras and tensorflow
[2]  
Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751
[3]   Unsupervised Learning from Narrated Instruction Videos [J].
Alayrac, Jean-Baptiste ;
Bojanowski, Piotr ;
Agrawal, Nishant ;
Sivic, Josef ;
Laptev, Ivan ;
Lacoste-Julien, Simon .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4575-4583
[4]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[5]  
[Anonymous], 2012, Ceres solver
[6]  
Biegler LT, 2010, MOS-SIAM SER OPTIMIZ, V10, pXIII, DOI 10.1137/1.9780898719383
[7]   Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].
Bogo, Federica ;
Kanazawa, Angjoo ;
Lassner, Christoph ;
Gehler, Peter ;
Romero, Javier ;
Black, Michael J. .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578
[8]  
Boulic R., 1990, Visual Computer, V6, P344, DOI 10.1007/BF01901021
[9]  
Bourdev L., 2011, HUMAN ANNOTATION TOO
[10]   Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].
Brachmann, Eric ;
Michel, Frank ;
Krull, Alexander ;
Yang, Michael Ying ;
Gumhold, Stefan ;
Rother, Carsten .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372