Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition

被引:15
作者
Tian, Yi [1 ]
Kong, Yu [2 ]
Ruan, Qiuqi [1 ]
An, Gaoyun [1 ]
Fu, Yun [2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
基金
中国国家自然科学基金;
关键词
Action Recognition; locally consistent group sparse coding; hierarchical sparse coding scheme; absolute and relative location models; IMAGE CLASSIFICATION; MOTION; FEATURES; VECTOR; ROBUST;
D O I
10.1109/TIP.2017.2788196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel two-layer video representation for human action recognition employing hierarchical group sparse encoding technique and spatio-temporal structure. In the first layer, a new sparse encoding method named locally consistent group sparse coding (LCGSC) is proposed to make full use of motion and appearance information of local features. LCGSC method not only encodes global layouts of features within the same video-level groups, but also captures local correlations between them, which obtains expressive sparse representations of video sequences. Meanwhile, two kinds of efficient location estimation models, namely an absolute location model and a relative location model, are developed to incorporate spatio-temporal structure into LCGSC representations. In the second layer, action-level group is established, where a hierarchical LCGSC encoding scheme is applied to describe videos at different levels of abstractions. On the one hand, the new layer captures higher order dependency between video sequences; on the other hand, it takes label information into consideration to improve discrimination of videos' representations. The superiorities of our hierarchical framework are demonstrated on several challenging datasets.
引用
收藏
页码:1748 / 1762
页数:15
相关论文
共 63 条
[21]   Deep Sequential Context Networks for Action Prediction [J].
Kong, Yu ;
Tao, Zhiqiang ;
Fu, Yun .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3662-3670
[22]   Max-Margin Action Prediction Machine [J].
Kong, Yu ;
Fu, Yun .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (09) :1844-1858
[23]   Interactive Phrases: Semantic Descriptions for Human Interaction Recognition [J].
Kong, Yu ;
Jia, Yunde ;
Fu, Yun .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (09) :1775-1788
[24]   Adaptive learning codebook for action recognition [J].
Kong, Yu ;
Zhang, Xiaoqin ;
Hu, Weiming ;
Jia, Yunde .
PATTERN RECOGNITION LETTERS, 2011, 32 (08) :1178-1186
[25]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[26]   On space-time interest points [J].
Laptev, I .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 64 (2-3) :107-123
[27]  
Lazebnik S., COMPUTER VISION PATT, V2, P2169
[28]   Sparseness Analysis in the Pretraining of Deep Neural Networks [J].
Li, Jun ;
Zhang, Tong ;
Luo, Wei ;
Yang, Jian ;
Yuan, Xiao-Tong ;
Zhang, Jian .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (06) :1425-1438
[29]   Learning Fast Low-Rank Projection for Image Classification [J].
Li, Jun ;
Kong, Yu ;
Zhao, Handong ;
Yang, Jian ;
Fu, Yun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) :4803-4814
[30]   Recognizing Activities via Bag of Words for Attribute Dynamics [J].
Li, Weixin ;
Yu, Qian ;
Sawhney, Harpreet ;
Vasconcelos, Nuno .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2587-2594