Enhancing Sequence-to-Sequence Text-to-Speech with Morphology

被引:3
|
作者
Taylor, Jason [1 ]
Richmond, Korin [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
INTERSPEECH 2020 | 2020年
关键词
Speech Synthesis; Sequence-to-Sequence; Morphology; Pronunciation;
D O I
10.21437/Interspeech.2020-1547
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural sequence-to-sequence (S2S) modelling encodes a single, unified representation for each input sequence. When used for text-to-speech synthesis (TTS), such representations must embed ambiguities between English spelling and pronunciation. For example, in pothole and there the character sequence th sounds different. This can be problematic when predicting pronunciation directly from letters. We posit pronunciation becomes easier to predict when letters are grouped into subword units like morphemes (e.g. a boundary lies between t and h in pothole but not there). Moreover, morphological boundaries can reduce the total number of, and increase the counts of, seen unit subsequences. Accordingly, we test here the effect of augmenting input sequences of letters with morphological boundaries. We find morphological boundaries substantially lower the Word and Phone Error Rates (WER and PER) for a Bi-LSTM performing G2P on one hand, and also increase the naturalness scores of Tacotrons performing TTS in a MUSHRA listening test on the other. The improvements to TTS quality are such that grapheme input augmented with morphological boundaries outperforms phone input without boundaries. Since morphological segmentation may be predicted with high accuracy, we highlight this simple pre-processing step has important potential for S2S modelling in TTS.
引用
收藏
页码:1738 / 1742
页数:5
相关论文
共 50 条
  • [41] Sequence-to-Sequence Multi-Modal Speech In-Painting
    Elyaderani, Mahsa Kadkhodaei
    Shirani, Shahram
    INTERSPEECH 2023, 2023, : 829 - 833
  • [42] Abstractive Text Summarization: Enhancing Sequence-to-Sequence Models Using Word Sense Disambiguation and Semantic Content Generalization
    Kouris, Panagiotis
    Alexandridis, Georgios
    Stafylopatis, Andreas
    COMPUTATIONAL LINGUISTICS, 2021, 47 (04) : 813 - 859
  • [43] Generating Natural Answers on Knowledge Bases and Text by Sequence-to-Sequence Learning
    Ye, Zhihao
    Cai, Ruichu
    Liao, Zhaohui
    Hao, Zhifeng
    Li, Jinfen
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 447 - 455
  • [44] IMPROVING SEQUENCE-TO-SEQUENCE VOICE CONVERSION BY ADDING TEXT-SUPERVISION
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Jiang, Yuan
    Liu, Li-Juan
    Liang, Chen
    Dai, Li-Rong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6785 - 6789
  • [45] Turkish abstractive text summarization using pretrained sequence-to-sequence models
    Baykara, Batuhan
    Gungor, Tunga
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (05) : 1275 - 1304
  • [46] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [47] Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models
    Watson, Daniel
    Zalmout, Nasser
    Habash, Nizar
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 837 - 843
  • [48] Hierarchical Sequence-to-Sequence Model for Multi-Label Text Classification
    Yang, Zhenyu
    Liu, Guojing
    IEEE ACCESS, 2019, 7 : 153012 - 153020
  • [49] Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yoruba Language Text
    Orife, Iroro Fred Onome
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2848 - 2852
  • [50] Denoising based Sequence-to-Sequence Pre-training for Text Generation
    Wang, Liang
    Zhao, Wei
    Jia, Ruoyu
    Li, Sujian
    Liu, Jingming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4003 - 4015