Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

被引:22
作者
Burghouts, G. J. [1 ]
Schutte, K. [2 ]
Bouma, H. [3 ]
den Hollander, R. J. M. [2 ]
机构
[1] TNO, Intelligent Imaging Res Grp, The Hague, Netherlands
[2] TNO, The Hague, Netherlands
[3] TNO, Field Comp Vis & Pattern Recognit, The Hague, Netherlands
关键词
Human action detection; Sparse representation; Pose estimation; Interactions between people; Spatiotemporal features; STIP; Tracking of humans; Person detection; Event recognition; Random forest; Support vector machines; RECOGNITION;
D O I
10.1007/s00138-013-0514-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a system is presented that can detect 48 human actions in realistic videos, ranging from simple actions such as 'walk' to complex actions such as 'exchange'. We propose a method that gives a major contribution in performance. The reason for this major improvement is related to a different approach on three themes: sample selection, two-stage classification, and the combination of multiple features. First, we show that the sampling can be improved by smart selection of the negatives. Second, we show that exploiting all 48 actions' posteriors by two-stage classification greatly improves its detection. Third, we show how low-level motion and high-level object features should be combined. These three yield a performance improvement of a factor 2.37 for human action detection in the visint.org test set of 1,294 realistic videos. In addition, we demonstrate that selective sampling and the two-stage setup improve on standard bag-of-feature methods on the UT-interaction dataset, and our method outperforms state-of-the-art for the IXMAS dataset.
引用
收藏
页码:85 / 98
页数:14
相关论文
共 47 条
[1]  
Ali S, 2008, LECT NOTES COMPUT SC, V5303, P1, DOI 10.1007/978-3-540-88688-4_1
[2]  
[Anonymous], 2008, P CVPR
[3]  
[Anonymous], 2012, P SPIE
[4]  
[Anonymous], 2009, P ICCV
[5]  
[Anonymous], P ICCV
[6]  
[Anonymous], 2009, BMVC 2009
[7]  
Aytar Y, 2007, 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, P536
[8]  
Black M., 1997, P CVPR
[9]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32