A hierarchical representation for human action recognition in realistic scenes

被引:0
作者
Qing Lei
Hongbo Zhang
Minghai Xin
Yiqiao Cai
机构
[1] University of Huaqiao,College of Computer Science and Technology
[2] University of Xiamen,Department of Cognitive Science
[3] Fujian Provincial Key Laboratory of Data Intensive Computing,undefined
来源
Multimedia Tools and Applications | 2018年 / 77卷
关键词
Action recognition; Realistic scenes; Feature selection; Action modelling; Video processing;
D O I
暂无
中图分类号
学科分类号
摘要
BoF statistic-based local space-time features action representation is very popular for human action recognition due to its simplicity. However, the problem of large quantization error and weak semantic representation decrease traditional BoF model’s discriminant ability when applied to human action recognition in realistic scenes. To deal with the problems, we investigate the generalization ability of BoF framework for action representation as well as more effective feature encoding about high-level semantics. Towards this end, we present two-layer hierarchical codebook learning framework for human action classification in realistic scenes. In the first-layer action modelling, superpixel GMM model is developed to filter out noise features in STIP extraction resulted from cluttered background, and class-specific learning strategy is employed on the refined STIP feature space to construct compact and descriptive in-class action codebooks. In the second-layer of action representation, LDA-Km learning algorithm is proposed for feature dimensionality reduction and for acquiring more discriminative inter-class action codebook for classification. We take advantage of hierarchical framework’s representational power and the efficiency of BoF model to boost recognition performance in realistic scenes. In experiments, the performance of our proposed method is evaluated on four benchmark datasets: KTH, YouTube (UCF11), UCF Sports and Hollywood2. Experimental results show that the proposed approach achieves improved recognition accuracy than the baseline method. Comparisons with state-of-the-art works demonstrates the competitive ability both in recognition performance and time complexity.
引用
收藏
页码:11403 / 11423
页数:20
相关论文
共 50 条
[1]  
Bhushan K(2017)A novel approach to defend multimedia flash crowd in cloud environment Multimedia Tools and Applications 2017 1-31
[2]  
Castrodad A(2012)Sparse modelling of human actions from motion imagery Int J Comput Vis 100 1-15
[3]  
Sapiro G(2017)Improving bag-of-visual-words model using visual n-grams for human action classification Expert Syst Appl 92 182-191
[4]  
Castrodad A(2012)Learning sparse representations for human action recognition IEEE Trans Pattern Anal Mach Intell 34 1576-1588
[5]  
Garcica RH(2009)Observing human-object interactions: using spatial and functional compatibility for recognition IEEE Trans Pattern Anal Mach Intell 31 1775-1789
[6]  
Cozar JR(2017)Enhancing the browser-side context-aware sanitization of suspicious HTML5 code for halting the DOM-based XSS vulnerabilities in cloud Int J Cloud Appl Comput 7 1-31
[7]  
Guil N(2013)3D convolutional neural networks for human action recognition IEEE Trans Pattern Anal Mach Intell 35 221-231
[8]  
Reyes EG(2016)Facial age estimation by using stacked feature composition and selection Vis Comput 32 1525-1536
[9]  
Sahli H(2017)Distance metric optimization driven convolutional neural network for age invariant face recognition Pattern Recogn 75 51-62
[10]  
Guha T(2012)Learning semantic features for action recognition via diffusion maps Comput Vis Image Underst 116 361-377