Pronunciation modeling by sharing Gaussian densities across phonetic models

被引:59
作者
Saraçlar, M
Nock, H
Khudanpur, S
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
基金
美国国家科学基金会;
关键词
D O I
10.1006/csla.2000.0140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to improve accuracy. This paper describes the incorporation of pronunciation models into acoustic model training in addition to recognition. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Acoustic models trained on the most accurate phonetic transcriptions result in worse recognition than acoustic models trained on canonical baseforms. Analysis of this counterintuitive result leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones, the hidden Markov model (HMM) states of the phoneme's model are instead allowed to share Gaussian mixture components with the HMM states of the model(s) of the alternate realization(s). Qualitatively, this amounts to making a soft decision about which surface form is realized. Quantitatively, experiments show that this method is particularly well suited for acoustic model training for spontaneous speech: a 1.7% (absolute) improvement in recognition accuracy on the Switchboard corpus is presented. (C) 2000 Academic Press.
引用
收藏
页码:137 / 160
页数:24
相关论文
共 38 条
  • [1] [Anonymous], P ICSLP SYDN
  • [2] [Anonymous], P EUROSPEECH 97 RHOD
  • [3] [Anonymous], THESIS CAMBRIDGE U
  • [4] TIED MIXTURE CONTINUOUS PARAMETER MODELING FOR SPEECH RECOGNITION
    BELLEGARDA, JR
    NAHAMOO, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (12): : 2033 - 2045
  • [5] BERNSTEIN J, 1986, DARPA SPEECH REC WOR, P41
  • [6] Byrne W, 1998, INT CONF ACOUST SPEE, P313, DOI 10.1109/ICASSP.1998.674430
  • [7] BYRNE W, 1997, IEEE WORKSH AUT SPEE, P26
  • [8] CHEN F, 1990, P IEEE INT C AC SPEE, P753
  • [9] COHEN MH, 1989, THESIS U CALIFORNIA
  • [10] SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES
    DIGALAKIS, VV
    RTISCHEV, D
    NEUMEYER, LG
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05): : 357 - 366