Multi-pass pronunciation adaptation

被引:0
作者
Bodenstab, Nathan [1 ]
Fanty, Mark [2 ]
机构
[1] Oregon Hlth & Sci Univ, OGI, Portland, OR 97201 USA
[2] Nuance Commun, Sunnyvale, CA USA
来源
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年
关键词
pronunciation; speech; learning; adaptation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A mapping between words and pronunciations (potential phonetic realizations) is a key component of speech recognition systems. Traditionally, this has been encoded in a lexicon where each pronunciation is transcribed by a linguist or generated by a grapheme-to-phoneme algorithm. For large vocabulary recognition systems, this process is highly susceptible to errors. We present an off-line data driven algorithm to correct suboptimal pronunciations using transcribed utterances. Unlike previous data driven algorithms that struggle to balance acoustic representation and multi-speaker generalization, our multi-pass approach maximizes both criteria, instead of compromising between the two. We demonstrate on an automated name dialing task that our multipass algorithm achieves a 70% error rate reduction when compared to a baseline grapheme-to-phoneme generated lexicon.
引用
收藏
页码:865 / +
页数:2
相关论文
共 6 条
[1]  
Amdal I., 2000, INTERSPEECH, P622
[2]   Learning name pronunciations in automatic speech recognition systems [J].
Beaufays, F ;
Sankar, A ;
Williams, S ;
Weintraub, M .
15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, :233-240
[3]  
BECHET F, 2002, P ICASSP 02, V1
[4]  
LUCASSEN JM, 1984, P ICASSP
[5]  
STRIK H, 2001, P ITRW AD METH SPEEC, P123
[6]  
Vintsyuk T. K., 1968, KIBERNETIKA, V4, P81