A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition

被引:16
作者
Ferras, Marc [1 ]
Madikeri, Srikanth [1 ]
Motlicek, Petr [1 ]
Dey, Subhadeep [1 ]
Bourlard, Herve [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
关键词
Codec; degraded speech; noise; robustness; simulation; speaker recognition; NOISE;
D O I
10.1109/LSP.2016.2537844
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The state-of-the-art speaker-recognition systems suffer from significant performance loss on degraded speech conditions and acoustic mismatch between enrolment and test phases. Past international evaluation campaigns, such as the NIST speaker recognition evaluation (SRE), have partly addressed these challenges in some evaluation conditions. This work aims at further assessing and compensating for the effect of a wide variety of speech-degradation processes on speaker-recognition performance. We present an open-source simulator generating degraded telephone, VoIP, and interview-speech recordings using a comprehensive list of narrow-band, wide-band, and audio codecs, together with a database of over 60 h of environmental noise recordings and over 100 impulse responses collected from publicly available data. We provide speaker-verification results obtained with an i-vector-based system using either a clean or degraded PLDA back-end on a NIST SRE subset of data corrupted by the proposed simulator. While error rates increase considerably under degraded speech conditions, large relative equal error rate (EER) reductions were observed when using a PLDA model trained with a large number of degraded sessions per speaker.
引用
收藏
页码:527 / 531
页数:5
相关论文
共 29 条
[1]  
[Anonymous], 2015, P INTERSPEECH
[2]  
[Anonymous], 2000, P AUT SPEECH REC CHA
[3]  
[Anonymous], 2013, P 30 INT C MACH LEAR
[4]  
[Anonymous], 2007, APPL GUIDE OBJECTIVE
[5]  
Daniel P., 2011, P IEEE WORKSH AUT SP
[6]  
Dean D., 2015, P INTERSPEECH
[7]  
Ferrari L., 2011, P NIST SPEAK REC AN, P9
[8]  
Ferrer L., 2013, P SRI INT SPEECH TEC
[9]  
Ganapathy S., 2012, P OD 2012 SPEAK LANG, P229
[10]   Text-independent speaker verification for real fast-varying noisy environments [J].
Ganchev T. ;
Potamitis I. ;
Fakotakis N. ;
Kokkinakis G. .
International Journal of Speech Technology, 2004, 7 (04) :281-292