GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS

被引：0

作者：

Rao, Kanishka ^{[1
]}

Peng, Fuchun ^{[1
]}

Sak, Hasim ^{[1
]}

Beaufays, Francoise ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

speech recognition; pronunciation; RNN; LSTM; G2P; CTC;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit graphemeto-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.

引用

页码：4225 / 4229

页数：5

共 18 条

[1] [Anonymous], 2006, P ICML
[2] [Anonymous], 2014, P INTERSPEECH
[3] [Anonymous], NEURAL COMPUTATION, DOI DOI 10.1162/NECO.1997.9.8.1735
[4] Bilcu E. B., 2008, THESIS
[5] Joint-sequence models for grapheme-to-phoneme conversion
Bisani, Maximilian
Ney, Hermann
[J]. SPEECH COMMUNICATION, 2008, 50 (05) : 434 - 451
[6] Chen Stanley F., 2003, P INTERSPEECH
[7] Galescu L., 2002, P INTERSPEECH
[8] Graves A., 2008, Guide to OCR for Arabic Scripts, V21, P545
[9] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[10] Hahn S., 2012, P INTERSPEECH

← 1 2 →