From Actemes to Action: A Strongly-supervised Representation for Detailed Action Understanding

被引:239
作者
Zhang, Weiyu [1 ]
Zhu, Menglong [1 ]
Derpanis, Konstantinos G. [2 ]
机构
[1] Univ Penn, Grasp Lab, Philadelphia, PA 19104 USA
[2] Ryerson Univ, Dept Comp Sci, Toronto, ON, Canada
来源
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年
关键词
D O I
10.1109/ICCV.2013.280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, keypoint labels (e. g., position) across spacetime are used in a data-driven training process to discover patches that are highly clustered in the spacetime keypoint configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of keypoints and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output. This output sheds further light into detailed action understanding.
引用
收藏
页码:2248 / 2255
页数:8
相关论文
共 31 条
[1]  
[Anonymous], 2006, CVPR
[2]  
[Anonymous], 2009, BMVC
[3]  
[Anonymous], 2012, CVPR
[4]  
[Anonymous], 2003, ICCV
[5]  
[Anonymous], 2011, CVPR
[6]  
[Anonymous], 2010, ECCV
[7]  
[Anonymous], 2008, BMVC 2008 19 BRIT MA
[8]  
[Anonymous], IJCV
[9]  
[Anonymous], 2010, ECCV
[10]  
[Anonymous], 2011, CVPR