Deep Neural Network Approaches to Speaker and Language Recognition

被引:283
作者
Richardson, Fred [1 ]
Reynolds, Douglas [1 ]
Dehak, Najim [2 ]
机构
[1] MIT, Lincoln Lab, Lexington, MA 02421 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
Bottleneck features; DNN; i-vector; language recognition; senone posteriors; speaker recognition; tandem features; FEATURES;
D O I
10.1109/LSP.2015.2420092
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for direct classification or for feature extraction. In this work we present the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained for ASR on Switchboard data we demonstrate large gains on performance in both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in C-avg on the LRE11 30 s test condition. It is also shown that further gains are possible using score or feature fusion leading to the possibility of a single i-vector extractor producing state-of-the-art SR and LR performance
引用
收藏
页码:1671 / 1675
页数:5
相关论文
共 31 条
[1]  
[Anonymous], P ICASSP
[2]  
[Anonymous], P INT
[3]  
[Anonymous], 2014, P ICASSP
[4]  
[Anonymous], P IEEE SLT WORKSH
[5]  
[Anonymous], P ICASSP
[6]  
[Anonymous], P IEEE OD
[7]  
[Anonymous], P IEEE OD
[8]  
[Anonymous], 2014, P SPEAK LANG REC WOR, DOI DOI 10.21437/ODYSSEY.2014-45
[9]  
[Anonymous], P ICASSP
[10]  
[Anonymous], 2009NIST LANGUAGE RE