Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition

被引:15
作者
Tian, Yi [1 ]
Kong, Yu [2 ]
Ruan, Qiuqi [1 ]
An, Gaoyun [1 ]
Fu, Yun [2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
基金
中国国家自然科学基金;
关键词
Action Recognition; locally consistent group sparse coding; hierarchical sparse coding scheme; absolute and relative location models; IMAGE CLASSIFICATION; MOTION; FEATURES; VECTOR; ROBUST;
D O I
10.1109/TIP.2017.2788196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel two-layer video representation for human action recognition employing hierarchical group sparse encoding technique and spatio-temporal structure. In the first layer, a new sparse encoding method named locally consistent group sparse coding (LCGSC) is proposed to make full use of motion and appearance information of local features. LCGSC method not only encodes global layouts of features within the same video-level groups, but also captures local correlations between them, which obtains expressive sparse representations of video sequences. Meanwhile, two kinds of efficient location estimation models, namely an absolute location model and a relative location model, are developed to incorporate spatio-temporal structure into LCGSC representations. In the second layer, action-level group is established, where a hierarchical LCGSC encoding scheme is applied to describe videos at different levels of abstractions. On the one hand, the new layer captures higher order dependency between video sequences; on the other hand, it takes label information into consideration to improve discrimination of videos' representations. The superiorities of our hierarchical framework are demonstrated on several challenging datasets.
引用
收藏
页码:1748 / 1762
页数:15
相关论文
共 63 条
[31]  
Liu JG, 2009, PROC CVPR IEEE, P1996
[32]  
Liu JG, 2009, PROC CVPR IEEE, P461, DOI 10.1109/CVPRW.2009.5206845
[33]   Automatic analysis of multimodal group actions in meetings [J].
McCowan, I ;
Gatica-Perez, D ;
Bengio, S ;
Lathoud, G ;
Barnard, M ;
Zhang, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (03) :305-317
[34]   Unsupervised learning of human action categories using spatial-temporal words [J].
Niebles, Juan Carlos ;
Wang, Hongcheng ;
Fei-Fei, Li .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 79 (03) :299-318
[35]  
Niebles JC, 2010, LECT NOTES COMPUT SC, V6312, P392, DOI 10.1007/978-3-642-15552-9_29
[36]   Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice [J].
Peng, Xiaojiang ;
Wang, Limin ;
Wang, Xingxing ;
Qiao, Yu .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 150 :109-125
[37]  
Peng XJ, 2014, LECT NOTES COMPUT SC, V8693, P581, DOI 10.1007/978-3-319-10602-1_38
[38]   Improving the Fisher Kernel for Large-Scale Image Classification [J].
Perronnin, Florent ;
Sanchez, Jorge ;
Mensink, Thomas .
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :143-156
[39]  
Qiong Hu, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P1521, DOI 10.1109/ICPR.2010.376
[40]  
Rodriguez M. D., 2008, PROC IEEE C COMPUT V, P1