Multiple speaker tracking and detection:: Handset normalization and duration scoring

被引:1
作者
Sönmez, K
Heck, L
Weintraub, M
机构
[1] SRI Int, Menlo Park, CA 94025 USA
[2] Nuance Commun, Menlo Park, CA 94025 USA
关键词
speaker tracking; verification; handset normalization;
D O I
10.1006/dspr.1999.0368
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported, (C) 2000 Academic Press.
引用
收藏
页码:133 / 142
页数:10
相关论文
共 7 条
[1]  
HECK LP, 1997, P ICASSP MUN GERM
[2]   The NIST 1999 Speaker Recognition Evaluation - An overview [J].
Martin, A ;
Przybocki, M .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :1-18
[3]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[4]  
REYNOLDS DA, 1996, P IEEE INT C AC SPEE, V1, P113
[5]  
SIU MH, 1992, P INT C AC SPEECH SI, V2, P189
[6]  
SONMEZ MK, 1999, P INT C AC SPEECH SI, V5, P2219
[7]  
WILCOX L, 1994, P ICASSP 94, V1, P161