Audio indexing:: primary components retrieval -: Robust classification in audio documents

被引:8
作者
Pinquier, Julien [1 ]
Andre-Obrecht, Regine [1 ]
机构
[1] UPS, CNRS, INP,UMR 5505, Inst Rech Informat Toulouse, F-31062 Toulouse, France
关键词
classification; indexing; audio documents; jingle; segmentation; duration; entropy; energy; spectral feature;
D O I
10.1007/s11042-006-0027-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work addresses the soundtrack indexing of multimedia documents. Our purpose is to detect and locate sound unity to structure the audio dataflow in program broadcasts (reports). We present two audio classification tools that we have developed. The first one, a speech music classification tool, is based on three original features: entropy modulation, stationary segment duration (with a Forward-Backward Divergence algorithm) and number of segments. They are merged with the classical 4 Hz modulation energy. It is divided into two classifications (speech/non-speech and music/non-music) and provides more than 90% of accuracy for speech detection and 89% for music detection. The other system, a jingle identification tool, uses an Euclidean distance in the spectral domain to index the audio data flow. Results show that is efficient: among 132 jingles to recognize, we have detected 130. Systems are tested on TV and radio corpora (more than 10 h). They are simple, robust and can be improved on every corpus without training or adaptation.
引用
收藏
页码:313 / 330
页数:18
相关论文
共 27 条
[1]  
AIGRAIN P, 1997, INTELLIGENT MULTIMED, P159
[2]  
AMARAL R, 2001, EUR C SPEECH COMM TE
[3]  
ANDREOBRECHT R, 1988, IEEE T AUDIO SPEECH, V36
[4]  
ANDREOBRECHT R, 1993, THESIS IRISA
[5]  
ANDREOBRECHT R, 1997, INT C AUD SPEECH SIG, P989
[6]  
[Anonymous], 1989, CALL PAR SON TRAIT A
[7]  
Atal B. S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P81
[8]  
BIMBOT F, 1988, INT C AUD SPEECH SIG, P425
[9]  
CAELEN J, 1979, THESIS UPS TOULOUSE
[10]  
CAMPIONE E, 1998, INT C SPOK LANG PROC, P3163