Discovering meaningful multimedia patterns with audio-visual concepts and associated text

被引:0
作者
Xie, L [1 ]
Kennedy, L [1 ]
Chang, SE [1 ]
Divakaran, A [1 ]
Sun, H [1 ]
Lin, CY [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5 | 2004年
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The work presents the first effort to automatically annotate the semantic meanings of temporal video patterns obtained through unsupervised discovery processes. This problem is interesting in domains where neither perceptual patterns nor semantic concepts have simple structures. The patterns in video are modeled with hierarchical hidden Markov models (HHMM), with efficient algorithms to learn the parameters. the model complexity, and the relevant features: the meanings are contained in words of the speech transcript of the video. The pattern-word association is obtained Via co-occurrence analysis and statistical machine translation models. Promising results are obtained through extensive experiments on 20+ hours of TRECVID news videos: video patterns that associate with distinct topics Such as el-nino and politics are identified: the HHMM temporal structure model compares favorably to a non-temporal clustering algorithm.
引用
收藏
页码:2383 / 2386
页数:4
相关论文
共 9 条
[1]  
AMIR A, 2003, TREC VIDEO RETRIEVEL
[2]  
Brown P. F., 1993, Computational Linguistics, V19, P263
[3]  
DUYGULA P, 2002, ECCV
[4]  
DUYGULU P, 2003, MULTIMEDIA INFORMATI
[5]  
Jurafsky D., 2000, Speech and Language Processing. An Introduction to Natural language Processing, Computational Linguistics
[6]  
LIU H, MONTYTAGGER COMMONSE
[7]  
Mart D., 1982, VISION COMPUTATIONAL
[8]  
*NIST, 2001, TREC VID RETR EV
[9]  
XIE LX, 2003, UNSUPERVISED MINING