Smooth Nonnegative Matrix Factorization for Unsupervised Audiovisual Document Structuring

被引:38
作者
Essid, Slim [1 ]
Fevotte, Cedric [1 ]
机构
[1] Telecom ParisTech, CNRS LTCI, F-75014 Paris, France
关键词
Bag of features; content structuring; indexing; machine learning; matrix factorization; unsupervised classification; videos; ALGORITHMS; PARTS;
D O I
10.1109/TMM.2012.2228474
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a new paradigm for unsupervised audiovisual document structuring. In this paradigm, a novel Nonnegative Matrix Factorization (NMF) algorithm is applied on histograms of counts (relating to a bag of features representation of the content) to jointly discover latent structuring patterns and their activations in time. Our NMF variant employs the Kullback-Leibler divergence as a cost function and imposes a temporal smoothness constraint to the activations. It is solved by a majorization-minimization technique. The approach proposed is meant to be generic and is particularly well suited to applications where the structuring patterns may overlap in time. As such, it is evaluated on two person-oriented video structuring tasks (one using the visual modality and the second the audio). This is done using a challenging database of political debate videos. Our results outperform reference results obtained by a method using Hidden Markov Models. Further, we show the potential that our general approach has for audio speaker diarization.
引用
收藏
页码:415 / 425
页数:11
相关论文
共 40 条
[1]  
AIGRAIN P, 1997, INTELLIGENT MULTIMED, P159
[2]  
[Anonymous], P WORKSH SIGN PROC A
[3]  
[Anonymous], 2007, P IEEE 11 INT C COMP
[4]  
[Anonymous], 1998, ECONOMETRIC SOC MONO
[5]  
[Anonymous], 2009, NIST RICH TRANSCR 20
[6]  
[Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms
[7]   Soccer highlights detection and recognition using HMMs [J].
Assfalg, J ;
Bertini, M ;
Del Bimbo, A ;
Nunziati, W ;
Pala, P .
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, :825-828
[8]   Semantic annotation of sports videos [J].
Assfalg, J ;
Bertini, M ;
Colombo, C ;
Del Bimbo, A .
IEEE MULTIMEDIA, 2002, 9 (02) :52-60
[9]  
Bishop C.M., 2008, Pattern Recognition and Machine Learning: A Matlab Companion
[10]  
Cemgil Ali Taylan, 2009, Comput Intell Neurosci, P785152, DOI 10.1155/2009/785152