Determining the best suited semantic events for cognitive surveillance

被引:17
作者
Fernandez, C. [1 ]
Baiget, P. [1 ]
Roca, F. X. [1 ]
Gonzalez, J. [1 ]
机构
[1] UAB, Comp Vis Ctr, Barcelona 08193, Spain
关键词
Cognitive surveillance; Event modeling; Content-based video retrieval; Ontologies; Advanced user interfaces; IMAGE; RETRIEVAL; TRACKING; OBJECT; SYSTEM;
D O I
10.1016/j.eswa.2010.09.070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art systems on cognitive surveillance identify and describe complex events in selected domains, thus providing end-users with tools to easily access the contents of massive video footage. Nevertheless, as the complexity of events increases in semantics and the types of indoor/outdoor scenarios diversify, it becomes difficult to assess which events describe better the scene, and how to model them at a pixel level to fulfill natural language requests. We present an ontology-based methodology that guides the identification, step-by-step modeling, and generalization of the most relevant events to a specific domain. Our approach considers three steps: (1) end-users provide textual evidence from surveilled video sequences; (2) transcriptions are analyzed top-down to build the knowledge bases for event description; and (3) the obtained models are used to generalize event detection to different image sequences from the surveillance domain. This framework produces user-oriented knowledge that improves on existing advanced interfaces for video indexing and retrieval, by determining the best suited events for video understanding according to end-users. We have conducted experiments with outdoor and indoor scenes showing thefts, chases, and vandalism, demonstrating the feasibility and generalization of this proposal. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4068 / 4079
页数:12
相关论文
共 27 条
[1]   A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video [J].
Albanese, Massimiliano ;
Chellappa, Rama ;
Moscato, Vincenzo ;
Picariello, Antonio ;
Subrahmanian, V. S. ;
Turaga, Pavan ;
Udrea, Octavian .
IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (06) :982-996
[2]  
[Anonymous], 2007, P IEEE C COMP VIS PA
[3]   A survey on tree edit distance and related problems [J].
Bille, P .
THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) :217-239
[4]  
Borzin A., 2007, 8 INT WORKSH IM AN M, P4, DOI [10.1109/WIAMIS.2007.79, DOI 10.1109/WIAMIS.2007.79]
[5]   Image mining by content [J].
Conci, A ;
Castro, EMMM .
EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (04) :377-383
[6]  
Fellbaum C., 1998, WordNet, DOI DOI 10.7551/MITPRESS/7287.001.0001
[7]   Interpretation of complex situations in a semantic-based surveillance framework [J].
Fernandez, Carles ;
Baiget, Pau ;
Roca, Xavier ;
Gonzalez, Jordi .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2008, 23 (07) :554-569
[8]   Road-traffic monitoring by knowledge-driven static and dynamic image analysis [J].
Fernandez-Caballero, Antonio ;
Gomez, Francisco J. ;
Lopez-Lopez, Juan .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) :701-719
[9]   Automatic detection and indexing of video-event shots for surveillance applications [J].
Foresti, GL ;
Marcenaro, L ;
Regazzoni, CS .
IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (04) :459-471
[10]   Video understanding for complex activity recognition [J].
Fusier, Florent ;
Valentin, Valery ;
Bremond, Francois ;
Thonnat, Monique ;
Borg, Mark ;
Thirde, David ;
Ferryman, James .
MACHINE VISION AND APPLICATIONS, 2007, 18 (3-4) :167-188