A COMPARISON OF APPROACHES FOR MODELING PROSODIC FEATURES IN SPEAKER RECOGNITION

被引:13
作者
Ferrer, Luciana [1 ]
Scheffer, Nicolas [1 ]
Shriberg, Elizabeth [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Speaker recognition; Prosody; Joint Factor Analysis; Support Vector Machines;
D O I
10.1109/ICASSP.2010.5495632
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Prosodic information has been successfully used for speaker recognition for more than a decade. The best-performing prosodic system to date has been one based on features extracted over syllables obtained automatically from speech recognition output. The features are then transformed using a Fisher kernel, and speaker models are trained using support vector machines (SVMs). Recently, a simpler version of these features, based on pseudo-syllables was shown to perform well when modeled using joint factor analysis (JFA). In this work, we study the two modeling techniques for the simpler set of features. We show that, for these features, a combination of JFA systems for different sequence lengths greatly outperforms both original modeling methods. Furthermore, we show that the combination of both methods gives significant improvements over the best single system. Overall, a performance improvement of 30% in the detection cost function (DCF) with respect to the two previously published methods is achieved using very simple strategies.
引用
收藏
页码:4414 / 4417
页数:4
相关论文
共 17 条
[1]  
[Anonymous], NIST SRE08 EVALUATIO
[2]   Modeling prosodic features with joint factor analysis for speaker verification [J].
Dehak, Najim ;
Dumouchel, Pierre ;
Kenny, Patrick .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2095-2103
[3]  
Ferrer L., 2007, P ICASSP HON APR
[4]  
Ferrer L., 2009, THESIS STANFORD U
[5]  
Ferrer L, 2006, INT CONF ACOUST SPEE, P101
[6]  
Glembek O., 2009, P ICASSP TAIP APR
[7]  
Kajarekar S., 2009, P ICASSP TAIP APR
[8]  
Kajarekar S., 2004, P ODYSS 04 SPEAK LAN, P51
[9]   Eigenvoice modeling with sparse training data [J].
Kenny, P ;
Boulianne, G ;
Dumouchel, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :345-354
[10]  
Kockmann Marcel, 2008, P 2008 IEEE WORKSH S