Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments

被引：28

作者：

Liu, Dianting ^{[1
]}

Yan, Yilin ^{[1
]}

Shyu, Mei-Ling ^{[1
]}

Zhao, Guiru ^{[2
]}

Chen, Min ^{[3
]}

机构：

[1] Univ Miami, Dept Elect & Comp Engn, Coral Gables, FL 33124 USA

[2] China Earthquake Networks Ctr, Beijing, Peoples R China

[3] Univ Washington Bothell, Comp & Software Syst, Bothell, WA USA

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT | 2015年 / 6卷 / 01期

关键词：

Action Detection; Action Recognition; Boosting Classifier; Hamming Distance; Gaussian Mixture Models; Sparse Representation; Spatio-Temporal;

D O I：

10.4018/ijmdem.2015010101

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Understanding semantic meaning of human actions captured in unconstrained environments has broad applications in fields ranging from patient monitoring, human-computer interaction, to surveillance systems. However, while great progresses have been achieved on automatic human action detection and recognition in videos that are captured in controlled/constrained environments, most existing approaches perform unsatisfactorily on videos with uncontrolled/unconstrained conditions (e.g., significant camera motion, background clutter, scaling, and light conditions). To address this issue, the authors propose a robust human action detection and recognition framework that works effectively on videos taken in controlled or uncontrolled environments. Specifically, the authors integrate the optical flow field and Harris3D corner detector to generate a new spatial-temporal information representation for each video sequence, from which the general Gaussian mixture model (GMM) is learned. All the mean vectors of the Gaussian components in the generated GMM model are concatenated to create the GMM supervector for video action recognition. They build a boosting classifier based on a set of sparse representation classifiers and hamming distance classifiers to improve the accuracy of action recognition. The experimental results on two broadly used public data sets, KTH and UCF YouTube Action, show that the proposed framework outperforms the other state-of-the-art approaches on both action detection and recognition.

引用

页码：1 / 18

页数：18

共 40 条

[1] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[2] Support vector machines using GMM supervectors for speaker verification [J].

Campbell, WM ;

Sturim, DE ;

Reynolds, DA .

IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311

[3]

Charbuillet C., 2011, INT C DIG AUD EFF PA, P425

[4]

Chuang K.-T., 2013, SMART INNOVATION SYS, V2, P531

[5] Kernel-based object tracking [J].

Comaniciu, D ;

Ramesh, V ;

Meer, P .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (05) :564-577

[6] Detecting moving objects, ghosts, and shadows in video streams [J].

Cucchiara, R ;

Grana, C ;

Piccardi, M ;

Prati, A .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (10) :1337-1342

[7] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[8]

Danafar S, 2007, LECT NOTES COMPUT SC, V4844, P457

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Dianting Liu, 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI), P626, DOI 10.1109/IRI.2013.6642527

← 1 2 3 4 →