YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition

被引：314

作者：

Guadarrama, Sergio ^{[1
]}

Krishnamoorthy, Niveda ^{[2
]}

Malkarnenkar, Girish ^{[2
]}

Venugopalan, Subhashini ^{[2
]}

Mooney, Raymond ^{[2
]}

Darrell, Trevor ^{[3
]}

Saenko, Kate ^{[4
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] UT Austin, Austin, TX USA

[3] Univ Calif Berkeley, ICSI, Berkeley, CA 94720 USA

[4] UMass Lowell, Lowell, MA USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年

关键词：

D O I：

10.1109/ICCV.2013.337

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities "in-the-wild". We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects; we also use a web-scale language model to "fill in" novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.

引用

页码：2712 / 2719

页数：8

共 31 条

[1]

[Anonymous], P AAAI 2013

[2]

[Anonymous], 2012, P 2 ACM INT C MULTIM

[3]

[Anonymous], 2011, P 15 C COMP NAT LANG

[4]

[Anonymous], EUR C ART INT

[5]

[Anonymous], P 31 ANN M OH STAT U

[6]

[Anonymous], 2007, INT J COMPUTER VISIO

[7]

[Anonymous], 2010, INT J COMPUT VISION, DOI DOI 10.1007/s11263-009-0275-4

[8]

[Anonymous], VIS SCI SOC

[9]

[Anonymous], 2011, P 2011 C EMPIRICAL M

[10]

[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199

← 1 2 3 4 →