Myanmar Text-to-Speech Synthesis Using End-to-End Model

被引:1
作者
Qin, Qinglai [1 ]
Yang, Jian [1 ]
Li, Peiying [1 ]
机构
[1] Yunnan Univ, Kunming, Yunnan, Peoples R China
来源
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020 | 2020年
关键词
Myanmar Speech Synthesis; Text-to-Speech; End-to-End; Pre-Trained; Language Model;
D O I
10.1145/3443279.3443295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a Myanmar speech synthesis system based on an End-to-End neural network model, which integrates the Myanmar phone model into the Tacotron2 End-to-End model. Based on the Seq2seq model architecture, we use phone-level embedding to form a feature prediction network from phone sequences to Mel spectrum, and combine with a semi-supervised speech generation network to generate highquality Myanmar synthesized speech. In addition, we introduced the BERT pre-training decoder module to assist the phone feature extraction, which reduces the system's dependence on the phone feature extraction network and improve the text feature richness. Compared with other Myanmar speech synthesis systems, this method effectively improves the naturalness and accuracy of synthesized speech under low resource conditions.
引用
收藏
页码:6 / 11
页数:6
相关论文
共 25 条
  • [1] [Anonymous], 2017, Char2wav: End-to-end speech synthesis
  • [2] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [3] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [4] Hlaing A M, 2017, INT C PACIFIC ASS CO
  • [5] Ito Keith, 2017, LJ SPEECH DATASET
  • [6] Li B, 2019, INT CONF ACOUST SPEE, P5621, DOI 10.1109/ICASSP.2019.8682674
  • [7] Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
    Li, Jingbei
    Wu, Zhiyong
    Li, Runnan
    Zhi, Pengpeng
    Yang, Song
    Meng, Helen
    [J]. INTERSPEECH 2019, 2019, : 4494 - 4498
  • [8] Lu YF, 2019, INT CONF ACOUST SPEE, P7050, DOI 10.1109/ICASSP.2019.8682368
  • [9] Arik SO, 2017, Arxiv, DOI arXiv:1702.07825
  • [10] CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
    Park, Kyubyong
    Mulc, Thomas
    [J]. INTERSPEECH 2019, 2019, : 1566 - 1570