Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

被引:180
作者
Greenberg, S [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
基金
美国国家科学基金会;
关键词
automatic speech recognition; pronunciation variation; spoken language; syllables;
D O I
10.1016/S0167-6393(99)00050-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current-generation automatic speech recognition (ASR) systems model spoken discourse as a quasi-linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if simplified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level. Thus, syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic prominence and lexical stress also appear to play an important role in pronunciation variation. The governing mechanism is likely to involve the informational valence associated with syllabic and lexical elements, and for this reason pronunciation variation offers a potential window onto the mechanisms responsible for the production and understanding of spoken language. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:159 / 176
页数:18
相关论文
共 53 条
  • [31] Robust speech recognition using the modulation spectrogram
    Kingsbury, BED
    Morgan, N
    Greenberg, S
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 117 - 132
  • [32] KIRCHOFF K, 1999, THESIS U BIELEFELD
  • [33] Kohler K. J., 1995, P 13 INT C PHON SC S, V2, P12
  • [34] Labov W., 1973, SOCIOLINGUISTIC PATT
  • [35] SPECTROGRAPHIC STUDY OF VOWEL REDUCTION
    LINDBLOM, B
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1963, 35 (11) : 1773 - &
  • [36] LINDBLOM B, 1990, NATO ADV SCI I D-BEH, V55, P403
  • [37] MCALLASTER D, 1998, P DARPA WORKSH CONVE
  • [38] NIEMANN H, 1997, P IEEE INT C AC SPEE, P75
  • [39] OSTENDORF M, 1997, MODELING SYSTEMATIC
  • [40] Rabiner L., 1993, Fundamentals of Speech Recognition