Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring

被引:0
作者
Novak, Josef R. [1 ]
Dixon, Paul R.
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
Hori, Chiori
Kashioka, Hideki
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
G2P; Alignment; RNNLM; WFST;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Grapheme-to-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.
引用
收藏
页码:2525 / 2528
页数:4
相关论文
共 9 条
[1]  
[Anonymous], 2010, INTERSPEECH 2010
[2]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[3]  
Caseiro D., P 2002 IEEE WORKSH S
[4]  
Jiampojamarn S., 2010, P 48 ANN M ASS COMP, P780
[5]  
Jiampojamarn Sittichai, 2007, P C N AM CHAPT ASS C, P372
[6]  
Mikolov T., ASRU 2011 DEM SESS
[7]   Learning string-edit distance [J].
Ristad, ES ;
Yianilos, PN .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (05) :522-532
[8]  
Sproat R., 2001, P 4 ISCA TRWSS
[9]  
Yang D., 2009, ASJ, P111