Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition

被引:82
作者
Gao, Z. [1 ,2 ]
Zhang, H. [1 ,2 ]
Xu, G. P. [1 ,2 ]
Xue, Y. B. [1 ,2 ]
Hauptmann, A. G. [3 ]
机构
[1] Tianjin Univ Technol, Minist Educ, Key Lab Comp Vis & Syst, Tianjin 300384, Peoples R China
[2] Tianjin Univ Technol, Tianjin Key Lab Intelligence Comp & Novel Softwar, Tianjin 300384, Peoples R China
[3] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
Multi-view action recognition; Dictionary learning; Graph model; MVBoW; Group sparsity; Discriminative and structured; GM-GS-DSDL; 3-D OBJECT RETRIEVAL; VARIABLE SELECTION; REPRESENTATION;
D O I
10.1016/j.sigpro.2014.08.034
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Human action may be observed from multi-view, which are highly related but sometimes look different from each other. Traditional metric learning algorithms have achieved satisfactory performance in single-view, but they often fail or do not satisfy when they are utilized to fuse different views. Thus, multi-view discriminative and structured dictionary learning with group sparsity and graph model (GM-GS-DSDL) is proposed to fuse different views and recognize human actions. First, spatio-temporal interest points are extracted for each view, and then multi-view bag of words (MVBoW) representation is employed, at the same time, the graph model is also utilized to fuse different views, which will remove overlapped interest points to explore their consistency properties. Furthermore, GM-GS-DSDL is formulated to discover the latent correlation among multiple views. In addition, we also issue a new multi-view action dataset with RGB, depth and skeleton data (called CVS-MV-RGBD). Large-scale experimental results on multi-view IXMAX and CVS-MV-RGBD datasets show that the exploring of consistency properties of different views by graph model is very useful, moreover, GM-GS-DSDL for each view, which are learnt simultaneously, can further improve the fusion performance. Comparative experiments demonstrate that our proposed algorithm can obtain competing performance against the state-of-the-art methods. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 76 条
[1]   Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[2]   Human motion analysis: A review [J].
Aggarwal, JK ;
Cai, Q .
COMPUTER VISION AND IMAGE UNDERSTANDING, 1999, 73 (03) :428-440
[3]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[4]  
[Anonymous], 2007, ICCV
[5]  
[Anonymous], P CVPR 07
[6]  
[Anonymous], 2004, INT WORKSHOP SPATIAL
[7]  
[Anonymous], P VS PETS
[8]  
[Anonymous], INT J DIGIT CONTENT
[9]  
[Anonymous], SURVEY MULTIVIEW LEA
[10]  
[Anonymous], 2009, Tech. Rep. CMU-CS- 09-161