SYLLABLE: A SELF-CONTAINED UNIT TO MODEL PRONUNCIATION VARIATION

被引:0
作者
Ng, Raymond W. M. [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
Syllable; pronunciation variation; CONTINUOUS SPEECH RECOGNITION; LANGUAGE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we demonstrate the potential of incorporating syllable-level information in acoustic modeling. The unit of syllable is not rigorously defined, which leads to a problem for its use. In this study, we derive syllable structures from the sonorant-band intensity profile of speech signal. We analyze the error statistics of a phone-based context-dependent speech recognizer and find interesting error patterns. Phone errors mainly occur inside a syllable but not at syllable boundaries. Pronunciation variation can thus be regarded as the replacement of phonetic elements within the time span of a solitary syllable. We apply simple rules to model the pronunciation variation phenomenon. A lexical modeling approach modifies the bi-phone transcription in the dictionary. It leads to a significant increase of phone correctness. The results shed light on a more intuitive and direct approach to model pronunciation variation within the scope of syllables.
引用
收藏
页码:4457 / 4460
页数:4
相关论文
共 13 条
[1]  
[Anonymous], 1990, TIMIT AC PHON CONT S
[2]  
Fisher W. M., 1997, SYLLABIFICATION SOFT
[3]   SYLLABLE AS A UNIT OF SPEECH RECOGNITION [J].
FUJIMURA, O .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :82-87
[4]   Syllable-based large vocabulary continuous speech recognition [J].
Ganapathiraju, A ;
Hamaker, J ;
Picone, J ;
Ordowski, M ;
Doddington, GR .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04) :358-366
[5]   Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation [J].
Greenberg, S .
SPEECH COMMUNICATION, 1999, 29 (2-4) :159-176
[6]   Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider [J].
Hamalainen, Annika ;
ten Bosch, Louis ;
Boves, Lou .
SPEECH COMMUNICATION, 2009, 51 (02) :130-150
[7]   CONTEXT-DEPENDENT PHONETIC HIDDEN MARKOV-MODELS FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION [J].
LEE, KF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1990, 38 (04) :599-609
[8]   Automatic language identification: an alternative approach to phonetic modelling [J].
Pellegrino, F ;
Andre-Obrecht, R .
SIGNAL PROCESSING, 2000, 80 (07) :1231-1244
[9]  
Pfitzinger HR, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1261, DOI 10.1109/ICSLP.1996.607838
[10]   Automatic prosodic variations modeling for language and dialect discrimination [J].
Rouas, Jean-Luc .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06) :1904-1911