Action recognition using saliency learned from recorded human gaze

被引：7

作者：

Stefic, Daria ^{[1
]}

Patras, Ioannis ^{[1
]}

机构：

[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England

来源：

IMAGE AND VISION COMPUTING | 2016年 / 52卷

关键词：

Action recognition; Saliency; Support Vector Machine (SVM); Latent variable; 3D Convolutional Neural Network (3D CNN);

D O I：

10.1016/j.imavis.2016.06.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses the problem of recognition and localization of actions in image sequences, by utilizing, in the training phase only, gaze tracking data of people watching videos depicting the actions in question. First, we learn discriminative action features at the areas of gaze fixation and train a Convolutional Network that predicts areas of fixation (i.e. salient regions) from raw image data. Second, we propose a Support Vector Machine-based recognition method for joint recognition and localization, in which the bounding box of the action in question is considered as a latent variable. In our formulation the optimization attempts to both minimize the classification cost and maximize the saliency within the bounding box. We show that the results obtained with the optimization where saliency within the bounding box is maximized outperform the results obtained when saliency within the bounding box is not maximized, i.e. when only classification cost is minimized. Furthermore, the results that we obtain outperform the state-of-the-art results on the UCF sports dataset. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：195 / 205

页数：11

共 68 条

[11] [Anonymous], P EUR C COMP VIS
[12] Assari S., 2014, P INT C COMP VIS PAT
[13] Baccouche M., 2011, P WORKSH HUM BEH UND
[14] The devil is in the details: an evaluation of recent feature encoding methods
Chatfield, Ken
Lempitsky, Victor
Vedaldi, Andrea
Zisserman, Andrew
[J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[15] Christian S., 2004, P INT C PATT REC, P2
[16] Duong D.Q.P.T.V., 2005, P INT C COMP VIS PAT
[17] Fathi A., 2012, P EUR C COMP VIS
[18] Object Detection with Discriminatively Trained Part-Based Models
Felzenszwalb, Pedro F.
Girshick, Ross B.
McAllester, David
Ramanan, Deva
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1627 - 1645
[19] Action Recognition Using Mined Hierarchical Compound Features
Gilbert, Andrew
Illingworth, John
Bowden, Richard
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) : 883 - 897
[20] HONGENG S, 2003, P INT C COMP VIS

← 1 2 3 4 5 6 7 →