Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training

被引:8
作者
Liu, Rui [1 ]
Hu, Yifan [1 ]
Zuo, Haolin [1 ]
Luo, Zhaojie [2 ]
Wang, Longbiao [3 ]
Gao, Guanglai [1 ]
机构
[1] Inner Mongolia Univ, Dept Comp Sci, Hohhot 010021, Peoples R China
[2] Osaka Univ, SANKEN, Osaka 5670047, Japan
[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-speech (TTS); agglutinative; morphology; language modeling; pre-training; END;
D O I
10.1109/TASLP.2023.3348762
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the development of deep learning, encoder-decoder based TTS models perform superior performance, in terms of naturalness, in mainstream languages such as Chinese, English, etc. Note that the linguistic information learning capability of the text encoder is the key. However, for TTS of low-resource agglutinative languages, the scale of the <text, speech> paired data is limited. Therefore, how to extract rich linguistic information from small-scale text data to enhance the naturalness of the synthesized speech, is an urgent issue that needs to be addressed. In this paper, we first collect a large unsupervised text data for BERT-like language model pre-training, and then adopt the trained language model to extract deep linguistic information for the input text of the TTS model to improve the naturalness of the final synthesized speech. It should be emphasized that in order to fully exploit the prosody-related linguistic information in agglutinative languages, we incorporated morphological information into the language model training and constructed a morphology-aware masking based BERT model (MAM-BERT). Experimental results based on various advanced TTS models validate the effectiveness of our approach. Further comparison of the various data scales also validates the effectiveness of our approach in low-resource scenarios.
引用
收藏
页码:1075 / 1087
页数:13
相关论文
共 24 条
  • [21] Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation
    Ji, Zizheng
    Dai, Lin
    Pang, Jin
    Shen, Tingting
    IEEE ACCESS, 2020, 8 : 100469 - 100484
  • [22] Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training
    Hanyu Luo
    Wenyu Shan
    Cheng Chen
    Pingjian Ding
    Lingyun Luo
    Interdisciplinary Sciences: Computational Life Sciences, 2023, 15 : 32 - 43
  • [23] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
    Izsak, Peter
    Guskin, Shira
    Wasserblat, Moshe
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 44 - 47
  • [24] Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training
    Luo, Hanyu
    Shan, Wenyu
    Chen, Cheng
    Ding, Pingjian
    Luo, Lingyun
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2023, 15 (01) : 32 - 43