Fusing R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {R}}$$\end{document} Features and Local Features with Context-Aware Kernels for Action Recognition

被引:0
作者
Chunfeng Yuan
Baoxin Wu
Xi Li
Weiming Hu
Stephen Maybank
Fangshi Wang
机构
[1] CAS,National Laboratory of Pattern Recognition, Institute of Automation
[2] Zhejiang University,College of Computer Science and Technology
[3] Birkbeck College,Department of Computer Science and Information Systems
[4] Beijing Jiaotong University,School of Software Engineering
关键词
Action recognition; Spatio-temporal interest points ; 3D ; transform; Hypergraph; Context-aware kernel;
D O I
10.1007/s11263-015-0867-0
中图分类号
学科分类号
摘要
The performance of action recognition in video sequences depends significantly on the representation of actions and the similarity measurement between the representations. In this paper, we combine two kinds of features extracted from the spatio-temporal interest points with context-aware kernels for action recognition. For the action representation, local cuboid features extracted around interest points are very popular using a Bag of Visual Words (BOVW) model. Such representations, however, ignore potentially valuable information about the global spatio-temporal distribution of interest points. We propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the 3D R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {R}}$$\end{document} transform which is defined as an extended 3D discrete Radon transform, followed by the application of a two-directional two-dimensional principal component analysis. For the similarity measurement, we model a video set as an optimized probabilistic hypergraph and propose a context-aware kernel to measure high order relationships among videos. The context-aware kernel is more robust to the noise and outliers in the data than the traditional context-free kernel which just considers the pairwise relationships between videos. The hyperedges of the hypergraph are constructed based on a learnt Mahalanobis distance metric. Any disturbing information from other classes is excluded from each hyperedge. Finally, a multiple kernel learning algorithm is designed by integrating the l2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{2}$$\end{document} norm regularization into a linear SVM classifier to fuse the R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {R}}$$\end{document} feature and the BOVW representation for action recognition. Experimental results on several datasets demonstrate the effectiveness of the proposed approach for action recognition.
引用
收藏
页码:151 / 171
页数:20
相关论文
共 87 条
  • [11] Ellis C(2015)Pose adaptive motion feature pooling for human action analysis International Journal of Computer Vision 111 229-248
  • [12] Masood S(2008)Unsupervised learning of human action categories using spatial–temporal words International Journal of Computer Vision 793 299-318
  • [13] Tappen M(2011)Spatiotemporal localization and categorization of human actions in unsegmented image sequences IEEE Transactions on Image Processing 20 1126-1140
  • [14] LaViola J(2010)A survey on vision-based human action recognition Image and Vision Computing 28 976-990
  • [15] Sukthankar R(2011)Pegasos: Primal estimated sub-gradient solver for svm Mathematical Programming 127 3-30
  • [16] Gaidon A(2011)Human action segmentation and recognition using discriminative semi-Markov models International Journal of Computer Vision 93 22-32
  • [17] Harchaoui Z(2003)3D Fourier based discrete Radon transform Applied and Computational Harmonic Analysis 15 33-69
  • [18] Schmid C(2013)Dense trajectories and motion boundary descriptors for action recognition International Journal of Computer Vision 1031 60-79
  • [19] Kloft M(2015)Collaborative multi-feature fusion for transductive spectral learning IEEE Transactions on Cybernetics 45 465-475
  • [20] Brefeld U(2007)Learning and matching of dynamic shape manifolds for human action recognition IEEE Transactions on Image Processing 16 1646-1661