MULTILEVEL SPEECH INTELLIGIBILITY FOR ROBUST SPEAKER RECOGNITION

被引:0
作者
Nemala, Sridhar Krishna [1 ]
Elhilali, Mounya [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Speech & Language Proc, Baltimore, MD 21218 USA
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
Speech intelligibility; Voice-activity detection; Speaker recognition; Noise robustness;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speech processing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.
引用
收藏
页码:4393 / 4396
页数:4
相关论文
共 14 条
[1]  
[Anonymous], 1993, LDC93S1
[2]  
[Anonymous], 1999, 301708 ETSI EN
[3]  
[Anonymous], 1969, ANSIS351969R1978
[4]  
[Anonymous], 1997, ANSIS351997R2007
[5]  
[Anonymous], 2000, DIGITAL SIGNAL PROCE
[6]  
Beigi H, 2011, FUNDAMENTALS OF SPEAKER RECOGNITION, P1, DOI 10.1007/978-0-387-77592-0
[7]   Multiresolution spectrotemporal analysis of complex sounds [J].
Chi, T ;
Ru, PW ;
Shamma, SA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) :887-906
[8]   A multilinear singular value decomposition [J].
De Lathauwer, L ;
De Moor, B ;
Vandewalle, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) :1253-1278
[9]   A spectro-temporal modulation index (STMI) for assessment of speech intelligibility [J].
Elhilali, M ;
Chi, T ;
Shamma, SA .
SPEECH COMMUNICATION, 2003, 41 (2-3) :331-348
[10]  
French N.R., 1945, The Journal of the Acoustical Society of America, V17, P103