YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition

被引:298
作者
Guadarrama, Sergio [1 ]
Krishnamoorthy, Niveda [2 ]
Malkarnenkar, Girish [2 ]
Venugopalan, Subhashini [2 ]
Mooney, Raymond [2 ]
Darrell, Trevor [3 ]
Saenko, Kate [4 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] UT Austin, Austin, TX USA
[3] Univ Calif Berkeley, ICSI, Berkeley, CA 94720 USA
[4] UMass Lowell, Lowell, MA USA
来源
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年
关键词
D O I
10.1109/ICCV.2013.337
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities "in-the-wild". We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects; we also use a web-scale language model to "fill in" novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.
引用
收藏
页码:2712 / 2719
页数:8
相关论文
共 31 条
  • [1] [Anonymous], P AAAI 2013
  • [2] [Anonymous], 2012, P 2 ACM INT C MULTIM
  • [3] [Anonymous], 2011, P 15 C COMP NAT LANG
  • [4] [Anonymous], EUR C ART INT
  • [5] [Anonymous], P 31 ANN M OH STAT U
  • [6] [Anonymous], 2007, INT J COMPUTER VISIO
  • [7] [Anonymous], 2010, INT J COMPUT VISION, DOI DOI 10.1007/s11263-009-0275-4
  • [8] [Anonymous], VIS SCI SOC
  • [9] [Anonymous], 2011, P 2011 C EMPIRICAL M
  • [10] [Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199