Video Tomographs and a Base Detector Selection Strategy for Improving Large-Scale Video Concept Detection

被引：4

作者：

Sidiropoulos, Panagiotis ^{[1
]}

Mezaris, Vasileios ^{[2
]}

Kompatsiaris, Ioannis ^{[2
]}

机构：

[1] UCL, Mullard Space Sci Lab, London WC1E 6BT, England

[2] Ctr Res & Technol Hellas, Inst Informat Technol, Thermi 57001, Greece

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2014年 / 24卷 / 07期

关键词：

Feature extraction; genetic algorithms; image sequence analysis; machine learning algorithms; video concept detection; video signal processing;

D O I：

10.1109/TCSVT.2014.2302554

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we deal with the problem of video concept detection to use the concept detection results toward a more effective concept-based video retrieval. The key novelties of this paper are as follows: 1) the use of spatio-temporal video slices (tomographs) in the same way that visual keyframes are typically used in video concept detection schemes. These spatio-temporal slices capture in a compact way motion patterns that are useful for detecting semantic concepts and are used for training a number of base detectors. The latter augment the set of keyframebased base detectors that can be trained using different frame representations. 2) The introduction of a generic methodology, built upon a genetic algorithm, for controlling which subset of the available base detectors (consequently, which subset of the possible shot representations) should be combined for developing an optimal detector for each specific concept. This methodology is directly applicable to the learning of hundreds of diverse concepts, while diverging from the one-size-fits-all approach that is typically used in problems of this size. The proposed techniques are evaluated on the datasets of the 2011 and 2012 Semantic Indexing Task of TRECVID, each comprising several hundred hours of heterogeneous video clips and ground-truth annotations for tens of concepts that exhibit significant variation in terms of generality, complexity, and human participation. The experimental results manifest the merit of the proposed techniques.

引用

页码：1251 / 1264

页数：14

共 42 条

[1]

Akutsu A., 1994, Proceedings ACM Multimedia '94, P349, DOI 10.1145/192593.192697

[2]

[Anonymous], 2009, Proc. ACM International Confence on Multimedia

[3]

[Anonymous], P TRECVID WORKSH

[4]

[Anonymous], 2009, P BMVC

[5]

[Anonymous], 2007, P 6 ACM INT C IMAGE

[6]

[Anonymous], P TRECVID WORKSH

[7]

[Anonymous], 2008, P 2008 IEEE C COMPUT, DOI DOI 10.1109/CVPR.2008.4587673

[8]

[Anonymous], 2009, Tech. Rep. CMU-CS- 09-161

[9] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[10]

Borth D., 2013, Proceedings of the 21st ACM international conference on multimedia, P223

← 1 2 3 4 5 →