Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification

被引:33
作者
Kotsakis, R. [1 ]
Kalliris, G. [1 ]
Dimoulas, C. [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Journalism & Mass Commun, Lab Elect Media, Thessaloniki, Greece
关键词
Audio-semantics; Radio-programmes; Content-management; Speech/non-speech segmentation; Pattern classification; Neural networks; NEWS TRANSCRIPTION SYSTEM; SPEECH RECOGNITION; BIOACOUSTICS APPLICATION; LONG-TERM; SEGMENTATION; FEATURES; WAVELETS;
D O I
10.1016/j.specom.2012.01.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The present paper focuses on the investigation of various audio pattern classifiers in broadcast-audio semantic analysis, using radio-programme-adaptive classification strategies with supervised training. Multiple neural network topologies and training configurations are evaluated and compared in combination with feature-extraction, ranking and feature-selection procedures. Different pattern classification taxonomies are implemented, using programme-adapted multi-class definitions and hierarchical schemes. Hierarchical and hybrid classification taxonomies are deployed in speech analysis tasks, facilitating efficient speaker recognition/identification, speech/music discrimination, and generally speech/non-speech detection-segmentation. Exhaustive qualitative and quantitative evaluation is conducted, including indicative comparison with non-neural approaches. Hierarchical approaches offer classification-similarities for easy adaptation to generic radio-broadcast semantic analysis tasks. The proposed strategy exhibits increased efficiency in radio-programme content segmentation and classification, which is one of the most demanding audio semantics tasks. This strategy can be easily adapted in broader audio detection and classification problems, including additional real-world speech-communication demanding scenarios. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:743 / 762
页数:20
相关论文
共 56 条
[1]   Speech/music segmentation using entropy and dynamism features in a HMM classification framework [J].
Ajmera, J ;
McCowan, I ;
Bourlard, H .
SPEECH COMMUNICATION, 2003, 40 (03) :351-363
[2]  
[Anonymous], 1995, NEURAL NETWORKS PATT
[3]  
[Anonymous], P ACM SIGMOD C
[4]  
[Anonymous], 2008, 1 MONDAY
[5]  
[Anonymous], 2000, 6 INT C SPOKEN LANGU
[6]  
Avdelidis K., 2010, P 128 AES CONV
[7]  
Avdelidis K, 2010, LECT NOTES ARTIF INT, V6086, P100, DOI 10.1007/978-3-642-13529-3_12
[8]   Robust speech detection in real acoustic backgrounds with perceptually motivated features [J].
Bach, Joerg-Hendrik ;
Anemueller, Joern ;
Kollmeier, Birger .
SPEECH COMMUNICATION, 2011, 53 (05) :690-706
[9]   Automatic speech recognition and speech variability: A review [J].
Benzeghiba, M. ;
De Mori, R. ;
Deroo, O. ;
Dupont, S. ;
Erbes, T. ;
Jouvet, D. ;
Fissore, L. ;
Laface, P. ;
Mertins, A. ;
Ris, C. ;
Rose, R. ;
Tyagi, V. ;
Wellekens, C. .
SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786
[10]   Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach [J].
Beyerlein, P ;
Aubert, X ;
Haeb-Umbach, R ;
Harris, M ;
Klakow, D ;
Wendemuth, A ;
Molau, S ;
Ney, H ;
Pitz, M ;
Sixtus, A .
SPEECH COMMUNICATION, 2002, 37 (1-2) :109-131