A framework for automatic semantic video annotation

被引：14

作者：

Altadmri, Amjad ^{[1
]}

Ahmed, Amr ^{[1
]}

机构：

[1] Lincoln Univ, Sch Comp Sci, Lincoln, England

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2014年 / 72卷 / 02期

关键词：

Semantic video annotation; Video search engine; Video information retrieval; Commonsense knowledgebases; Semantic gap; VISUAL-SEARCH; CONCEPTNET; RETRIEVAL;

D O I：

10.1007/s11042-013-1363-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the 'semantic gap'. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation.

引用

页码：1167 / 1191

页数：25

共 44 条

[1]

Altadmri Amjad, 2009, 2009 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2009), P74, DOI 10.1109/ICSIPA.2009.5478723

[2]

Altadmri A, 2009, IASTED INT C ART INT, V683, P34

[3] Visual Net: Commonsense Knowledgebase for Video and Image Indexing and Retrieval Application [J].

Altadmri, Amjad ;

Ahmed, Amr .

2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, :636-641

[4] A multi-modal system for the retrieval of semantic video events [J].

Amir, A ;

Basu, S ;

Iyengar, G ;

Lin, CY ;

Naphade, M ;

Smith, JR ;

Srinivasan, S ;

Tseng, B .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) :216-236

[5]

[Anonymous], 2007, PROC IEEE C COMPUT V

[6]

[Anonymous], SEMANTIC MINING TECH

[7] Semantic annotation and retrieval of video events using multimedia ontologies [J].

Bagdanov, Andrew D. ;

Bertini, Marco ;

Del Birnbo, Alberto ;

Serra, Giuseppe ;

Torniai, Carlo .

ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, :713-+

[8] Content based video matching using spatiotemporal volumes [J].

Basharat, Arslan ;

Zhai, Yun ;

Shah, Mubarak .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :360-377

[9] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[10]

Blank M, 2005, IEEE I CONF COMP VIS, P1395

← 1 2 3 4 5 →