An approach to pose-based action recognition

被引:246
作者
Wang, Chunyu [1 ]
Wang, Yizhou [1 ]
Yuille, Alan L. [2 ]
机构
[1] Peking Univ, Schl EECS, Key Lab Machine Percept MoE, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
D O I
10.1109/CVPR.2013.123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the "best" one. Then we group the estimated joints into five body parts (e. g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations of body parts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
引用
收藏
页码:915 / 922
页数:8
相关论文
共 27 条
[1]  
[Anonymous], USING RICHER MODELS
[2]  
[Anonymous], IEEE I CONF COMP VIS
[3]  
[Anonymous], 2008, 2008 IEEE C COMPUTER
[4]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[5]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[6]   Top-down influences on stereoscopic depth-perception [J].
Bulthoff, I ;
Bulthoff, H ;
Sinha, P .
NATURE NEUROSCIENCE, 1998, 1 (03) :254-257
[7]  
CAMPBELL LW, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P624, DOI 10.1109/ICCV.1995.466880
[8]  
Dong G., 1999, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P43, DOI [DOI 10.1145/312129.312191, 10.1145/312129., DOI 10.1145/312129]
[9]  
Efros A., 2003, Proc. IEEE Internationa Conference on Computer Vision, V2, P726
[10]  
Gilbert A, 2008, LECT NOTES COMPUT SC, V5302, P222, DOI 10.1007/978-3-540-88682-2_18