Learning name pronunciations in automatic speech recognition systems

被引:10
作者
Beaufays, F
Sankar, A
Williams, S
Weintraub, M
机构
来源
15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2003年
关键词
D O I
10.1109/TAI.2003.1250196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many speech recognition systems that provide over-the-phone services, e.g. name dialers, stock quote providers, location finders, rely on the accurate recognition of proper names. For this to happen, the systems need to know how their users will pronounce these words. However, predicting the pronunciation of a proper name is a notoriously difficult problem as it depends, on the origin of the name, the linguistic background of the speaker, and other cultural and sociological factors, in addition of course to the word spelling. In this paper, we describe a data-driven method that learns proper name pronunciations from audio samples of these words. The algorithm relies on the machinery of a general purpose speech recognizer to find the phone sequence that best matches the sample speech waveforms. In addition, it incorporates linguistic knowledge automatically acquired from a pronunciation dictionary to ensure that the learned pronunciations are "reasonable" from a linguistic viewpoint. We show on a corporate name dialing database that the proposed algorithm reduces the call routing error rate by 40% compared to a reference letter-to-phone pronunciation engine.
引用
收藏
页码:233 / 240
页数:8
相关论文
共 14 条
[1]  
BATES R, 2002, P ISCA WORKSH
[2]  
BATES R, 2001, P WORKSH PROS SPEECH
[3]  
BEAUFAYS F, 2003, P EUR
[4]  
BECHET F, 2002, P ICASSP
[5]  
BYRNE W, 1998, P ICASSP
[6]  
DIGALAKIS V, 1996, IEEE T SPEECH AUDIO, P281
[7]  
Finke M., 1997, P EUR
[8]  
GAO YQ, 2001, P ICASSP
[9]  
LUCASSEN JM, 1984, P ICASSP
[10]  
RAMABHADRAN B, 1998, P ICASSP