Modality Mixture Projections for Semantic Video Event Detection

被引:56
作者
Shen, Jialie [1 ]
Tao, Dacheng [2 ]
Li, Xuelong [3 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[3] Univ London, Sch Comp Sci & Informat Syst, London WC1E 7HX, England
关键词
Multimodule; semantic event video detection;
D O I
10.1109/TCSVT.2008.2005607
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Event detection is one of the most fundamental components for various kinds of domain applications of video information system. In recent years,, it has gained a considerable interest of practitioners and academics from different areas. While detecting video event has been the subject of extensive research efforts recently. much less existing approach has considered multimodal information and related efficiency issues. In this paper, we use a subspace selection technique to achieve fast and accurate video event detection using a subspace selection technique. The approach is capable of discriminating different classes and preserving the intramodal geometry of samples within an identical class. With the method, feature vectors presenting different kind of multi data can be easily projected from different identities and modalities onto a unified subspace, on which recognition process can be performed. Furthermore, the training stage is carried out once and we have a unified transformation matrix to project different modalities. Unlike existing multimodal detection systems, the new system works well when some modalities are not available. Experimental results based on soccer video and TRECVID news video collections demonstrate the effectiveness, efficiency and robustness of the proposed NIMP for individual recognition tasks in comparison to the existing approaches.
引用
收藏
页码:1587 / 1596
页数:10
相关论文
共 22 条
[1]  
CABASSON R, 2002, S EL IM SCI TECHN ST
[2]  
CHAISORN L, 2002, INT C MULT EXP
[3]  
Chang P, 2002, IEEE IMAGE PROC, P609
[4]  
Gonzalez R., 2019, Digital Image Processing, V2nd
[5]  
Ke Y., 2005, INT C COMP VIS
[6]   On the automatic indexing of cricket using camera motion parameters [J].
Lazarescu, M ;
Venkatesh, S ;
West, G .
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, :809-812
[7]   Semantic indexing of soccer audio-visual sequences: A multimodal approach based on controlled Markov chains [J].
Leonardi, R ;
Migliorati, P ;
Prandini, M .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (05) :634-643
[8]  
LI, 2002, S EL IM SCI TECHN ST
[9]  
OTOOLE C, 1999, CHALL IM RETR
[10]   Multi-modal extraction of highlights from TV formula 1 programs [J].
Petkovic, M ;
Mihajlovic, V ;
Jonker, W ;
Djordjevic-Kajan, S .
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, :817-820