Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos

被引:33
作者
Ballan, Lamberto [1 ]
Bertini, Marco [1 ]
Del Bimbo, Alberto [1 ]
Seidenari, Lorenzo [1 ]
Serra, Giuseppe [1 ]
机构
[1] Univ Florence, Media Integrat & Commun Ctr, I-50139 Florence, Italy
关键词
Human action categorization; spatio-temporal local descriptors; visual codebooks; RECOGNITION; CATEGORIES; DENSE;
D O I
10.1109/TMM.2012.2191268
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration, and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization. In this paper, we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook, we employ radius-based clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy.
引用
收藏
页码:1234 / 1245
页数:12
相关论文
共 55 条
[1]  
[Anonymous], ARTIFICIAL INTELLIGE
[2]  
[Anonymous], 2009, P BRIT MACH VIS C
[3]  
Ballan Lamberto, 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, P506, DOI 10.1109/ICCVW.2009.5457658
[4]   Event detection and recognition for semantic annotation of video [J].
Ballan, Lamberto ;
Bertini, Marco ;
Del Bimbo, Alberto ;
Seidenari, Lorenzo ;
Serra, Giuseppe .
MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 51 (01) :279-302
[5]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[6]  
Bregonzio M, 2009, PROC CVPR IEEE, P1948, DOI 10.1109/CVPRW.2009.5206779
[7]   Cross-Dataset Action Detection [J].
Cao, Liangliang ;
Liu, Zicheng ;
Huang, Thomas S. .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :1998-2005
[8]  
Chen M.-y., 2009, MOSIFT RECOGNIZING H
[9]  
Dollar P., 2005, Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) (IEEE Cat. No. 05EX1178), P65
[10]  
Efros AA, 2003, NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, P726