H plus O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

被引:169
作者
Tekin, Bugra [1 ]
Bogo, Federica [1 ]
Pollefeys, Marc [1 ,2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Swiss Fed Inst Technol, Zurich, Switzerland
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. Given a single RGB image, our model jointly estimates the 3D hand and object poses, models their interactions, and recognizes the object and action classes with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end on single images. We further merge and propagate information in the temporal domain to infer interactions between hand and object trajectories and recognize actions. The complete model takes as input a sequence of frames and outputs per-frame 3D hand and object pose predictions along with the estimates of object and action categories for the entire sequence. We demonstrate state-of-the-art performance of our algorithm even in comparison to the approaches that work on depth data and ground-truth annotations.
引用
收藏
页码:4506 / 4515
页数:10
相关论文
共 74 条
[1]  
[Anonymous], 2015, PROC CVPR IEEE
[2]  
[Anonymous], 2011, ICCV
[3]   Object Level Visual Reasoning in Videos [J].
Baradel, Fabien ;
Neverova, Natalia ;
Wolf, Christian ;
Mille, Julien ;
Mori, Greg .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :106-122
[4]  
Bertasius G, 2017, ROBOTICS: SCIENCE AND SYSTEMS XIII
[5]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[6]   Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image [J].
Brachmann, Eric ;
Michel, Frank ;
Krull, Alexander ;
Yang, Michael Ying ;
Gumhold, Stefan ;
Rother, Carsten .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3364-3372
[7]  
Cai Y. S. M., 2016, ROBOTICS SCI SYSTEMS
[8]   Gradient descent optimization of smoothed information retrieval metrics [J].
Chapelle, Olivier ;
Wu, Mingrui .
INFORMATION RETRIEVAL, 2010, 13 (03) :216-235
[9]  
Cho K., 2014, P SSST8 8 WORKSH SYN, P103, DOI 10.3115/v1/w14-4012
[10]   PoTion: Pose MoTion Representation for Action Recognition [J].
Choutas, Vasileios ;
Weinzaepfel, Philippe ;
Revaud, Jerome ;
Schmid, Cordelia .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7024-7033