Action recognition using saliency learned from recorded human gaze

被引:7
作者
Stefic, Daria [1 ]
Patras, Ioannis [1 ]
机构
[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
关键词
Action recognition; Saliency; Support Vector Machine (SVM); Latent variable; 3D Convolutional Neural Network (3D CNN);
D O I
10.1016/j.imavis.2016.06.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of recognition and localization of actions in image sequences, by utilizing, in the training phase only, gaze tracking data of people watching videos depicting the actions in question. First, we learn discriminative action features at the areas of gaze fixation and train a Convolutional Network that predicts areas of fixation (i.e. salient regions) from raw image data. Second, we propose a Support Vector Machine-based recognition method for joint recognition and localization, in which the bounding box of the action in question is considered as a latent variable. In our formulation the optimization attempts to both minimize the classification cost and maximize the saliency within the bounding box. We show that the results obtained with the optimization where saliency within the bounding box is maximized outperform the results obtained when saliency within the bounding box is not maximized, i.e. when only classification cost is minimized. Furthermore, the results that we obtain outperform the state-of-the-art results on the UCF sports dataset. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:195 / 205
页数:11
相关论文
共 68 条
  • [11] [Anonymous], P EUR C COMP VIS
  • [12] Assari S., 2014, P INT C COMP VIS PAT
  • [13] Baccouche M., 2011, P WORKSH HUM BEH UND
  • [14] The devil is in the details: an evaluation of recent feature encoding methods
    Chatfield, Ken
    Lempitsky, Victor
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [15] Christian S., 2004, P INT C PATT REC, P2
  • [16] Duong D.Q.P.T.V., 2005, P INT C COMP VIS PAT
  • [17] Fathi A., 2012, P EUR C COMP VIS
  • [18] Object Detection with Discriminatively Trained Part-Based Models
    Felzenszwalb, Pedro F.
    Girshick, Ross B.
    McAllester, David
    Ramanan, Deva
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1627 - 1645
  • [19] Action Recognition Using Mined Hierarchical Compound Features
    Gilbert, Andrew
    Illingworth, John
    Bowden, Richard
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) : 883 - 897
  • [20] HONGENG S, 2003, P INT C COMP VIS