A Discriminative Performance Metric for GMM-UBM Speaker Identification

被引:0
作者
Dehzangi, Omid [1 ]
Ma, Bin
Chng, Eng Siong [1 ]
Li, Haizhou [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年
关键词
speaker identification; GMM-UBM; discriminative performance metric;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Gaussian mixture modeling with universal background model (GMM-UBM) is a widely used method for speaker identification, where the GMM model is used to characterize a specific speaker's voice. The estimation of model parameters is generally performed based on the maximum likelihood (ML) or maximum a posteriori (MAP) criteria. In this way, interspeaker information that discriminates between different speakers is not taken into account. To overcome this limitation, we design a discriminative performance metric to capture interspeaker variabilities leading to improve the classification capability of speaker models. A learning algorithm is presented to tune the Gaussian mixture weights by optimizing the frame classification accuracy of GMM classifiers. We design an objective function to directly relate the model parameters to the performance metric. The comparative study of the proposed method is done with the basic GMM-UBM system on the 2001 NIST SRE corpus. Experimental results demonstrate that the proposed learning algorithm considerably improves the GMM-UBM system on speaker identification.
引用
收藏
页码:2114 / +
页数:2
相关论文
共 13 条
[1]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[2]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[3]  
Fawcett Tom., 2004, ROC Graphs: Notes and Practical Considerations for Researchers
[4]   Recent advances in speaker recognition [J].
Furui, S .
PATTERN RECOGNITION LETTERS, 1997, 18 (09) :859-872
[5]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[6]  
Longworth C., 2006, INT 2006
[7]   Speaker recognition - general and data fusion classifier approaches methods [J].
Ramachandran, RP ;
Farrell, KR ;
Ramachandran, R ;
Mammone, RJ .
PATTERN RECOGNITION, 2002, 35 (12) :2801-2821
[8]  
Reynolds D.A., 2002, P IEEE INT C AC SPEE, P472
[9]   SPEAKER IDENTIFICATION AND VERIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].
REYNOLDS, DA .
SPEECH COMMUNICATION, 1995, 17 (1-2) :91-108
[10]   ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].
REYNOLDS, DA ;
ROSE, RC .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83