LARGE-SCALE VIDEO EVENT CLASSIFICATION USING DYNAMIC TEMPORAL PYRAMID MATCHING OF VISUAL SEMANTICS

被引：0

作者：

Codella, Noel C. F. ^{[1
,2
]}

Hua, Gang

Cao, Liangliang ^{[1
,2
]}

Merler, Michele ^{[1
,2
]}

Gong, Leiguang ^{[1
,2
]}

Hill, Matt ^{[1
,2
]}

Smith, John R. ^{[1
,2
]}

机构：

[1] IBM TJ Watson Res Ctr, Multimedia Res Grp, Yorktown Hts, NY 10598 USA

[2] Stevens Inst Technol, Dept Comp Sci, Hoboken, NJ 07030 USA

来源：

2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013) | 2013年

关键词：

pyramid; event; video; temporal; semantics; SEQUENCE; KERNEL;

D O I：

暂无

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Video event classification and retrieval has recently emerged as a challenging research topic. In addition to the variation in appearance of visual content and the large scale of the collections to be analyzed, this domain presents new and unique challenges in the modeling of the explicit temporal structure and implicit temporal trends of content within the video events. In this study, we present a technique for video event classification that captures temporal information over semantics using a scalable and efficient modeling scheme. An architecture for partitioning videos into a linear temporal pyramid, using segments of equal length and segments determined by the patterns of the underlying data, is applied over a rich underlying semantic description at the frame level using a taxonomy of nearly 1000 concepts containing 500,000 training images. Forward model selection with data bagging is used to prune the space of temporal features and data for efficiency. The system is implemented in the Hadoop MapReduce environment for arbitrary scalability. Our method is applied to the TRECVID Multimedia Event Detection 2012 task. Results demonstrate a significant boost in performance of over 50%, in terms of mean average precision, compared to common max or average pooling, and 17.7% compared to more complex pooling strategies that ignore temporal content.

引用

页码：2862 / 2866

页数：5

共 31 条

[1]

Akbacak M., 2012, NIST TEXT RETR C VID

[2]

[Anonymous], P IEEE INT C COMP VI

[3]

Bailer W, 2011, LECT NOTES COMPUT SC, V6523, P359

[4]

Ballan L., 2009, MULTIMED TOOLS APPL

[5]

Cao L., 2012, NIST TEXT RETR C VID

[6]

Cao LL, 2012, LECT NOTES COMPUT SC, V7573, P688, DOI 10.1007/978-3-642-33709-3_49

[7]

Cheng H., 2012, NIST TEXT RETR C VID

[8]

Codella N.C.F., 2012, IEEE INT C MULT EXP, P747

[9] Visual event detection using multi-dimensional concept dynamics [J].

Ebadollahi, Shahram ;

Xie, Lexing ;

Chang, Shih-Fu ;

Smith, John R. .

2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :881-884

[10] Object Detection with Discriminatively Trained Part-Based Models [J].

Felzenszwalb, Pedro F. ;

Girshick, Ross B. ;

McAllester, David ;

Ramanan, Deva .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645

← 1 2 3 4 →