A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition

被引:0
作者
Rodríguez, LJ [1 ]
Torres, MI [1 ]
机构
[1] Univ Basque Country, Fac Ciencia & Tecnol, Pattern Recognit & Speech Technol Grp, DEE, E-48080 Bilbao, Spain
来源
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2004年 / 3206卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper a speaker adaptation methodology is proposed, which first automatically determines a number of speaker clusters in the training material, then estimates the parameters of the corresponding models, and finally applies a fast match strategy - based on the so called histogram models - to choose the optimal cluster for each test utterance. The fast match strategy is critical to make this methodology useful in real applications, since carrying out several recognition passes - one for each cluster of speakers -, and then selecting the decoded string with the highest likelihood, would be too costly. Preliminary experimentation over two speech databases in Spanish reveal that both the clustering algorithm and the fast match strategy are consistent and reliable. The histogram models, though being suboptimal - they succeeded in guessing the right cluster for unseen test speakers in 85% of the cases with read speech, and in 63% of the cases with spontaneous speech -, yielded around a 6% decrease in error rate in phonetic recognition experiments.
引用
收藏
页码:433 / 440
页数:8
相关论文
共 8 条
[1]  
Faltlhauser R., 2001, P IEEE WORKSH AUT SP
[2]  
GALES MJF, 2000, IEEE T SPEECH AUDIO, V8
[3]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[4]   Rapid speaker adaptation in eigenvoice space [J].
Kuhn, R ;
Junqua, JC ;
Nguyen, P ;
Niedzielski, N .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06) :695-707
[5]   A frequency warping approach to speaker normalization [J].
Lee, L ;
Rose, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01) :49-60
[6]   MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].
LEGGETTER, CJ ;
WOODLAND, PC .
COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185
[7]   ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].
LINDE, Y ;
BUZO, A ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95
[8]   Speaker clustering for speech recognition using vocal tract parameters [J].
Naito, M ;
Deng, L ;
Sagisaka, Y .
SPEECH COMMUNICATION, 2002, 36 (3-4) :305-315