Speaker verification using adapted Gaussian mixture models

被引:2878
作者
Reynolds, DA [1 ]
Quatieri, TF [1 ]
Dunn, RB [1 ]
机构
[1] MIT, Lincoln Lab, Speech Syst Technol Grp, Lexington, MA 02420 USA
关键词
speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation;
D O I
10.1006/dspr.1999.0361
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. (C) 2000 Academic Press.
引用
收藏
页码:19 / 41
页数:23
相关论文
共 33 条
  • [1] A Reynolds D., 1992, GAUSSIAN MIXTURE MOD
  • [2] [Anonymous], 1997, Proceedings of the uropean Conference on Speech Communication and Technology
  • [3] CAREY MJ, 1991, P INT C AC SPEECH SI, P397
  • [4] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [5] DODDINGTON G, IN PRESS SPEECH COMM
  • [6] Approaches to speaker detection and tracking in conversational speech
    Dunn, RB
    Reynolds, DA
    Quatieri, TF
    [J]. DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 93 - 112
  • [7] Fukunaga K., 1972, Introduction to statistical pattern recognition
  • [8] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
    Gauvain, Jean-Luc
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02): : 291 - 298
  • [9] Hart P.E., 1973, Pattern recognition and scene analysis
  • [10] HECK LP, 1997, P ICASSP, P1071