A maximum A posteriori approach to speaker adaptation using the trended hidden Markov model

被引:5
作者
Chengalvarayan, R [1 ]
Deng, L
机构
[1] Lucent Technol Inc, Lucent Speech Solut, Naperville, IL 60566 USA
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 05期
基金
加拿大自然科学与工程研究理事会;
关键词
adaptation; hidden Markov models; MAP estimation; polynomial functions; speech recognition;
D O I
10.1109/89.928919
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A formulation of the maximum a posteriori (MAP) approach to speaker adaptation is presented with use of the trended or nonstationary-state hidden Markov model (HMM), where the Gaussian means in each HMM state are characterized by time-varying polynomial trend functions of the state sojourn time. Assuming uncorrelatedness among the polynomial coefficients in the trend functions, we have obtained analytical results for the MAP estimates of the parameters including time-varying means and time-invariant precisions. We have implemented a speech recognizer based on these results in speaker adaptation experiments using the TI46 corpora, The experimental evaluation demonstrates that the trended HMM, with use of either the linear or the quadratic polynomial trend function, consistently outperforms the conventional, stationary-state HMM, The evaluation also shows that the unadapted, speaker-independent models are outperformed by the models adapted by the MAP procedure under supervision with as few as a single adaptation token. Further, adaptation of polynomial coefficients alone is shown to be better than adapting both polynomial coefficients and precision matrices when fewer than four adaptation tokens are used, while the reverse is found with a greater number of adaptation tokens.
引用
收藏
页码:549 / 557
页数:9
相关论文
共 18 条
[1]   Speech trajectory discrimination using the minimum classification error learning [J].
Chengalvarayan, R ;
Deng, L .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (06) :505-515
[2]   Use of generalized dynamic feature parameters for speech recognition [J].
Chengalvarayan, R ;
Deng, L .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (03) :232-242
[3]  
CHENGALVARAYAN R, 1997, P ICASSP, V2, P1415
[4]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[5]  
DeGroot M., 1970, OPTIMAL STAT DECISIO
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   A GENERALIZED HIDDEN MARKOV MODEL WITH STATE-CONDITIONED TREND FUNCTIONS OF TIME FOR THE SPEECH SIGNAL [J].
DENG, L .
SIGNAL PROCESSING, 1992, 27 (01) :65-78
[8]   Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions [J].
Deng, L ;
Aksmanovic, M .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (04) :319-324
[9]   Speech Recognition Using Hidden Markov Models with Polynomial Regression Functions as Nonstationary States [J].
Deng, Li ;
Aksmanovic, Mike ;
Sun, Xiaodong ;
Wu, C. F. Jeff .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :507-520
[10]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298