Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

被引：180

作者：

Greenberg, S ^{[1
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

SPEECH COMMUNICATION | 1999年 / 29卷 / 2-4期

基金：

美国国家科学基金会;

关键词：

automatic speech recognition; pronunciation variation; spoken language; syllables;

D O I：

10.1016/S0167-6393(99)00050-3

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current-generation automatic speech recognition (ASR) systems model spoken discourse as a quasi-linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if simplified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level. Thus, syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic prominence and lexical stress also appear to play an important role in pronunciation variation. The governing mechanism is likely to involve the informational valence associated with syllabic and lexical elements, and for this reason pronunciation variation offers a potential window onto the mechanisms responsible for the production and understanding of spoken language. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：159 / 176

页数：18

共 53 条

[31] Robust speech recognition using the modulation spectrogram
Kingsbury, BED
Morgan, N
Greenberg, S
[J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 117 - 132
[32] KIRCHOFF K, 1999, THESIS U BIELEFELD
[33] Kohler K. J., 1995, P 13 INT C PHON SC S, V2, P12
[34] Labov W., 1973, SOCIOLINGUISTIC PATT
[35] SPECTROGRAPHIC STUDY OF VOWEL REDUCTION
LINDBLOM, B
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1963, 35 (11) : 1773 - &
[36] LINDBLOM B, 1990, NATO ADV SCI I D-BEH, V55, P403
[37] MCALLASTER D, 1998, P DARPA WORKSH CONVE
[38] NIEMANN H, 1997, P IEEE INT C AC SPEE, P75
[39] OSTENDORF M, 1997, MODELING SYSTEMATIC
[40] Rabiner L., 1993, Fundamentals of Speech Recognition

← 1 2 3 4 5 6 →