Exploring STIP-based models for recognizing human interactions in TV videos

被引：23

作者：

Marin-Jimenez, Manuel J. ^{[1
]}

Yeguas, Enrique ^{[1
]}

Perez de la Blanca, Nicolas ^{[2
]}

机构：

[1] Univ Cordoba, Dept Comp Sci & Numer Anal, E-14071 Cordoba, Spain

[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain

来源：

PATTERN RECOGNITION LETTERS | 2013年 / 34卷 / 15期

关键词：

Human interaction; TV video; STIP; BOW; RECOGNITION;

D O I：

10.1016/j.patrec.2012.10.018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human motion recognition - action (HAR) or interaction (HIR) - in real video data is identified as a very challenging task. In the last few years models of increasing complexity have been proposed in order to improve the performance in the task. However, it still remains unclear whether it is the features or the models what deserves the increase in complexity. In this paper an evaluation of such problem is carried out in the HIR task. For that purpose, we compare the results obtained in our experiments - by using STIP-based features and BOW models as basis and combined with a standard classifier - with some of the more effective and recent approaches that use alternative representation models. We perform a comprehensive experimental study on two state-of-the-art databases in HIR: TV Human interactions and UT-interactions. We compare the results of our experiments with recent results published on these datasets. In addition, we run cross-data experiments on Hollywood-2 dataset in order to study the capability of generalization of the trained models through different datasets. The most relevant result is that the model combining STIP + BOW is competitive in the HIR task in comparison with the most complex ones. It is also shown that the vocabulary learning subtask can be improved by using compression algorithms on large enough initial set of features. In contrast to other categorization tasks the context does not help, the results show that dense sampling of STIP is the best choice, but only when it is used inside the region of interest of the interaction. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：1819 / 1828

页数：10

共 39 条

[1] [Anonymous], 2011, NIPS
[2] [Anonymous], P ICML
[3] [Anonymous], NIPS
[4] [Anonymous], 2011, BMVC
[5] [Anonymous], 2011, ICCV
[6] [Anonymous], IEEE PAMI
[7] [Anonymous], IEEE PAMI
[8] [Anonymous], 1997, 1602 AI MIT
[9] [Anonymous], 2007, ICCV
[10] [Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms

← 1 2 3 4 →