Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification

被引:12
作者
Yuo, KH [1 ]
Wang, HC [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 30043, Taiwan
关键词
Karhunen-Loeve transform; transformation embedded GMM; generalized covariance matrices;
D O I
10.1016/S0167-6393(99)00017-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Karhunen-Loeve transform is a well-known technique for orthonormally mapping features into an uncorrelated space. The Gaussian mixture model (GMM) with diagonal covariance matrices is a popular technique for modeling the speech feature distributions. These two techniques can be combined to improve the performance of speaker or speech recognition systems. The drawback of the combination is that both set of parameters are not optimized together. This paper presents a new model structure that integrates both orthonormal transformation and diagonal-covariance Gaussian mixture into a unified framework. All parameters of this model are obtained simultaneously by Maximum Likelihood estimation. This idea is further extended to attain a new GMM with generalized covariance matrices (GC-GMM). The traditional GMM with diagonal or full covariance matrices is a special case of the GC-GMM. The proposed method is demonstrated on a 100-person connected digit database for text independent speaker identification. In comparison with the traditional GMM, the computational complexity and the number of parameters can be greatly reduced with no degradation in system performance. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:227 / 241
页数:15
相关论文
共 17 条
[1]   HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features [J].
Chengalvarayan, R ;
Deng, L .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (03) :243-256
[2]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[3]   SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].
DIGALAKIS, VV ;
RTISCHEV, D ;
NEUMEYER, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366
[4]  
Flury B., 1988, Common Principal Components and Related Multivariate Models
[5]   COMMON PRINCIPAL COMPONENTS IN K-GROUPS [J].
FLURY, BN .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1984, 79 (388) :892-898
[6]  
Fukunaga K., 1990, INTRO STAT PATTERN R
[7]  
Gales MJF, 1998, INT CONF ACOUST SPEE, P657, DOI 10.1109/ICASSP.1998.675350
[8]   Mean and variance adaptation within the MLLR framework [J].
Gales, MJF ;
Woodland, PC .
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264
[9]  
GALES MJF, 1997, CUEDFINFENGTR287
[10]  
HUANG XD, 1990, INFORMATION TECHNOLO, V7