Real-time speaker identification and verification

被引:119
作者
Kinnunen, T [1 ]
Karpov, E [1 ]
Fränti, P [1 ]
机构
[1] Univ Joensuu, Dept Comp Sci, FIN-80101 Joensuu, Finland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期
关键词
Gaussian mixture model (GMM); pre-quantization; real-time; speaker pruning; speaker recognition; vector quantization (VQ);
D O I
10.1109/TSA.2005.853206
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.
引用
收藏
页码:277 / 288
页数:12
相关论文
共 46 条
[1]  
[Anonymous], 1999, PROC 6 EUR C SPEECH
[2]  
[Anonymous], P EUROSPEECH GEN
[3]  
Ariyaeeinia A. M., 1997, P EUR 97, P1379
[4]   Score normalization for text-independent speaker verification systems [J].
Auckenthaler, R ;
Carey, M ;
Lloyd-Thomas, H .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :42-54
[5]  
AUCKENTHALER R, 2001, P SPEAK OD SPEAK REC, P83
[6]  
BEIGI HSM, 1999, P 6 EUR C SPEECH COM, P2203
[7]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[8]   Searching in metric spaces [J].
Chávez, E ;
Navarro, G ;
BaezaYates, R ;
Marroquín, JL .
ACM COMPUTING SURVEYS, 2001, 33 (03) :273-321
[9]  
Deller J., 2000, Discrete-Time Processing of Speech Signals, DOI DOI 10.1109/9780470544402.CH11
[10]  
DERSCH DR, 1997, P 5 EUR C SPEECH COM, P2323