SYLLABLE: A SELF-CONTAINED UNIT TO MODEL PRONUNCIATION VARIATION

被引：0

作者：

Ng, Raymond W. M. ^{[1
]}

Hirose, Keikichi ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan

来源：

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年

关键词：

Syllable; pronunciation variation; CONTINUOUS SPEECH RECOGNITION; LANGUAGE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we demonstrate the potential of incorporating syllable-level information in acoustic modeling. The unit of syllable is not rigorously defined, which leads to a problem for its use. In this study, we derive syllable structures from the sonorant-band intensity profile of speech signal. We analyze the error statistics of a phone-based context-dependent speech recognizer and find interesting error patterns. Phone errors mainly occur inside a syllable but not at syllable boundaries. Pronunciation variation can thus be regarded as the replacement of phonetic elements within the time span of a solitary syllable. We apply simple rules to model the pronunciation variation phenomenon. A lexical modeling approach modifies the bi-phone transcription in the dictionary. It leads to a significant increase of phone correctness. The results shed light on a more intuitive and direct approach to model pronunciation variation within the scope of syllables.

引用

页码：4457 / 4460

页数：4

共 13 条

[1]

[Anonymous], 1990, TIMIT AC PHON CONT S

[2]

Fisher W. M., 1997, SYLLABIFICATION SOFT

[3] SYLLABLE AS A UNIT OF SPEECH RECOGNITION [J].

FUJIMURA, O .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :82-87

[4] Syllable-based large vocabulary continuous speech recognition [J].

Ganapathiraju, A ;

Hamaker, J ;

Picone, J ;

Ordowski, M ;

Doddington, GR .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04) :358-366

[5] Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation [J].

Greenberg, S .

SPEECH COMMUNICATION, 1999, 29 (2-4) :159-176

[6] Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider [J].

Hamalainen, Annika ;

ten Bosch, Louis ;

Boves, Lou .

SPEECH COMMUNICATION, 2009, 51 (02) :130-150

[7] CONTEXT-DEPENDENT PHONETIC HIDDEN MARKOV-MODELS FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION [J].

LEE, KF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (04) :599-609

[8] Automatic language identification: an alternative approach to phonetic modelling [J].

Pellegrino, F ;

Andre-Obrecht, R .

SIGNAL PROCESSING, 2000, 80 (07) :1231-1244

[9]

Pfitzinger HR, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1261, DOI 10.1109/ICSLP.1996.607838

[10] Automatic prosodic variations modeling for language and dialect discrimination [J].

Rouas, Jean-Luc .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06) :1904-1911

← 1 2 →