Myanmar Text-to-Speech Synthesis Using End-to-End Model

被引：1

作者：

Qin, Qinglai ^{[1
]}

Yang, Jian ^{[1
]}

Li, Peiying ^{[1
]}

机构：

[1] Yunnan Univ, Kunming, Yunnan, Peoples R China

来源：

2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020 | 2020年

关键词：

Myanmar Speech Synthesis; Text-to-Speech; End-to-End; Pre-Trained; Language Model;

D O I：

10.1145/3443279.3443295

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a Myanmar speech synthesis system based on an End-to-End neural network model, which integrates the Myanmar phone model into the Tacotron2 End-to-End model. Based on the Seq2seq model architecture, we use phone-level embedding to form a feature prediction network from phone sequences to Mel spectrum, and combine with a semi-supervised speech generation network to generate highquality Myanmar synthesized speech. In addition, we introduced the BERT pre-training decoder module to assist the phone feature extraction, which reduces the system's dependence on the phone feature extraction network and improve the text feature richness. Compared with other Myanmar speech synthesis systems, this method effectively improves the naturalness and accuracy of synthesized speech under low resource conditions.

引用

页码：6 / 11

页数：6

共 25 条

[1] [Anonymous], 2017, Char2wav: End-to-end speech synthesis
[2] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[4] Hlaing A M, 2017, INT C PACIFIC ASS CO
[5] Ito Keith, 2017, LJ SPEECH DATASET
[6] Li B, 2019, INT CONF ACOUST SPEE, P5621, DOI 10.1109/ICASSP.2019.8682674
[7] Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
Li, Jingbei
Wu, Zhiyong
Li, Runnan
Zhi, Pengpeng
Yang, Song
Meng, Helen
[J]. INTERSPEECH 2019, 2019, : 4494 - 4498
[8] Lu YF, 2019, INT CONF ACOUST SPEE, P7050, DOI 10.1109/ICASSP.2019.8682368
[9] Arik SO, 2017, Arxiv, DOI arXiv:1702.07825
[10] CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Park, Kyubyong
Mulc, Thomas
[J]. INTERSPEECH 2019, 2019, : 1566 - 1570

← 1 2 3 →