Learning Sparse Representations for Human Action Recognition

被引:260
作者
Guha, Tanaya [1 ]
Ward, Rabab Kreidieh [1 ]
机构
[1] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada
关键词
Action recognition; dictionary learning; expression recognition; overcomplete; orthogonal matching pursuit; sparse representation; spatio-temporal descriptors;
D O I
10.1109/TPAMI.2011.253
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos. Although this work concentrates on recognizing human movements-physical actions as well as facial expressions-the proposed approach is fairly general and can be used to address other classification problems. In order to model human actions, three overcomplete dictionary learning frameworks are investigated. An overcomplete dictionary is constructed using a set of spatio-temporal descriptors (extracted from the video sequences) in such a way that each descriptor is represented by some linear combination of a small number of dictionary elements. This leads to a more compact and richer representation of the video sequences compared to the existing methods that involve clustering and vector quantization. For each framework, a novel classification algorithm is proposed. Additionally, this work also presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant, and fast to compute. The proposed approach repeatedly achieves state-of-the-art results on several public data sets containing various physical actions and facial expressions.
引用
收藏
页码:1576 / 1588
页数:13
相关论文
共 36 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]   Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning [J].
Ali, Saad ;
Shah, Mubarak .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (02) :288-303
[3]  
[Anonymous], P BRIT MACH VIS C
[4]  
[Anonymous], 2003, P 9 IEEE INT C COMP
[5]  
[Anonymous], P IEEE INT C AUD SPE
[6]  
[Anonymous], 2008, P IEEE INT C COMP VI
[7]  
[Anonymous], P BRIT MACH VIS C SE
[8]  
[Anonymous], P IEEE C COMP VIS PA
[9]   Random Projections of Smooth Manifolds [J].
Baraniuk, Richard G. ;
Wakin, Michael B. .
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2009, 9 (01) :51-77
[10]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267