Acoustic-articulatory modeling with the trajectory HMM

被引:71
作者
Zhang, Le [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Sch Informat, Edinburgh EH8 9LW, Midlothian, Scotland
关键词
articulatory inversion; MOCHA-TIMIT; trajectory hidden Markov model (HMM);
D O I
10.1109/LSP.2008.917004
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, we introduce an hidden Markov model (HMM)-based inversion system to recovery articulatory movements from speech acoustics. Trajectory HMMs are used as generative models for modelling articulatory data. Experiments on the MOCHA-TIMIT corpus indicate that the jointly trained acoustic-articulatory models are more accurate (lower RMS error) than the separately trained ones, and that trajectory HMM training results in greater accuracy compared with conventional maximum likelihood HMM training. Moreover, the system has the ability to synthesize articulatory movements directly from a textual representation.
引用
收藏
页码:245 / 248
页数:4
相关论文
共 19 条
[1]  
[Anonymous], 2004, P INTERSPEECH
[2]   INVERSION OF ARTICULATORY-TO-ACOUSTIC TRANSFORMATION IN VOCAL-TRACT BY A COMPUTER-SORTING TECHNIQUE [J].
ATAL, BS ;
CHANG, JJ ;
MATHEWS, MV ;
TUKEY, JW .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (05) :1535-1555
[3]   A self-learning predictive model of articulator movements during speech production [J].
Blackburn, CS ;
Young, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (03) :1659-1670
[4]  
BRIDLE JS, 2004, P INT C SPOK LANG PR, P725, DOI DOI 10.21437/INTERSPEECH.2004-281
[5]   Multisyn: Open-domain unit selection for the Festival speech synthesis system [J].
Clark, Robert A. J. ;
Richmond, Korin ;
King, Simon .
SPEECH COMMUNICATION, 2007, 49 (04) :317-330
[6]  
DUSAN S, 2005, P 5 SEM SPEECH PROD, P237
[7]   SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59
[8]  
King S., 1999, P ICPHS 1999 SAN FRA, P2259
[9]  
KOBAYASHI T, 1991, INT CONF ACOUST SPEE, P489, DOI 10.1109/ICASSP.1991.150383
[10]   INFERRING ARTICULATION AND RECOGNIZING GESTURES FROM ACOUSTICS WITH A NEURAL NETWORK TRAINED ON X-RAY MICROBEAM DATA [J].
PAPCUN, G ;
HOCHBERG, J ;
THOMAS, TR ;
LAROCHE, F ;
ZACKS, J ;
LEVY, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1992, 92 (02) :688-700