GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

被引:0
作者
Rao, Kanishka [1 ]
Peng, Fuchun [1 ]
Sak, Hasim [1 ]
Beaufays, Francoise [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
speech recognition; pronunciation; RNN; LSTM; G2P; CTC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit graphemeto-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.
引用
收藏
页码:4225 / 4229
页数:5
相关论文
共 18 条
  • [1] [Anonymous], 2006, P ICML
  • [2] [Anonymous], 2014, P INTERSPEECH
  • [3] [Anonymous], NEURAL COMPUTATION, DOI DOI 10.1162/NECO.1997.9.8.1735
  • [4] Bilcu E. B., 2008, THESIS
  • [5] Joint-sequence models for grapheme-to-phoneme conversion
    Bisani, Maximilian
    Ney, Hermann
    [J]. SPEECH COMMUNICATION, 2008, 50 (05) : 434 - 451
  • [6] Chen Stanley F., 2003, P INTERSPEECH
  • [7] Galescu L., 2002, P INTERSPEECH
  • [8] Graves A., 2008, Guide to OCR for Arabic Scripts, V21, P545
  • [9] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
  • [10] Hahn S., 2012, P INTERSPEECH