Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech

被引:0
作者
Oo, Yin May [1 ,2 ]
Wattanavekin, Theeraphol
Li, Chenfang [2 ]
De Silva, Pasindu
Sarin, Supheakmungkol
Pipatsrisawat, Knot
Jansche, Martin [2 ]
Kjartansson, Oddur
Gutkin, Alexander
机构
[1] Google Res, Singapore, Singapore
[2] Google, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
关键词
speech corpora; finite-state grammars; low-resource; text-to-speech; open-source; Burmese; RECURRENT NEURAL-NETWORK; MODELS;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces an open-source crowd-sourced multi-speaker speech corpus along with the comprehensive set of finite-state transducer (FST) grammars for performing text normalization for the Burmese (Myanmar) language. We also introduce the open-source finite-state grammars for performing grapheme-to-phoneme (G2P) conversion for Burmese. These three components are necessary (but not sufficient) for building a high-quality text-to-speech (TTS) system for Burmese, a tonal Southeast Asian language from the Sino-Tibetan family which presents several linguistic challenges. We describe the corpus acquisition process and provide the details of our finite state-based approach to Burmese text normalization and G2P. Our experiments involve building a multi-speaker TTS system based on long short term memory (LSTM) recurrent neural network (RNN) models, which were previously shown to perform well for other languages in a low-resource setting. Our results indicate that the data and grammars that we are announcing are sufficient to build reasonably high-quality models comparable to other systems. We hope these resources will facilitate speech and language research on the Burmese language, which is considered by many to be low-resource due to the limited availability of free linguistic data.
引用
收藏
页码:6328 / 6339
页数:12
相关论文
共 89 条
[1]  
Agiomyrgiannakis Y, 2015, INT CONF ACOUST SPEE, P4230, DOI 10.1109/ICASSP.2015.7178768
[2]  
Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11
[3]  
Arnaudo D, 2019, GLOBAL DIGITAL CULTURES, P96
[4]  
Baljekar P., 2018, THESIS
[5]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[6]  
Black AW, 2019, INT CONF ACOUST SPEE, P5971, DOI 10.1109/ICASSP.2019.8683536
[7]  
Bornas A. J., 2019, ARXIV190302642
[8]  
Botha J, 2017, P 2017 C EMP METH NA, P2879, DOI [10.18653/v1/D17-1309, DOI 10.18653/V1/D17-1309]
[9]  
Burmese Corpus, 2019, CROWD SOURC HIGH QUA
[10]  
Chang CharlesB., 2009, Journal of the Southeast Asian Linguistics Society, V1, P77