Enhancing Sequence-to-Sequence Text-to-Speech with Morphology

被引：3

作者：

Taylor, Jason ^{[1
]}

Richmond, Korin ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2020 | 2020年

关键词：

Speech Synthesis; Sequence-to-Sequence; Morphology; Pronunciation;

D O I：

10.21437/Interspeech.2020-1547

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Neural sequence-to-sequence (S2S) modelling encodes a single, unified representation for each input sequence. When used for text-to-speech synthesis (TTS), such representations must embed ambiguities between English spelling and pronunciation. For example, in pothole and there the character sequence th sounds different. This can be problematic when predicting pronunciation directly from letters. We posit pronunciation becomes easier to predict when letters are grouped into subword units like morphemes (e.g. a boundary lies between t and h in pothole but not there). Moreover, morphological boundaries can reduce the total number of, and increase the counts of, seen unit subsequences. Accordingly, we test here the effect of augmenting input sequences of letters with morphological boundaries. We find morphological boundaries substantially lower the Word and Phone Error Rates (WER and PER) for a Bi-LSTM performing G2P on one hand, and also increase the naturalness scores of Tacotrons performing TTS in a MUSHRA listening test on the other. The improvements to TTS quality are such that grapheme input augmented with morphological boundaries outperforms phone input without boundaries. Since morphological segmentation may be predicted with high accuracy, we highlight this simple pre-processing step has important potential for S2S modelling in TTS.

引用

页码：1738 / 1742

页数：5

共 50 条

[41] Sequence-to-Sequence Multi-Modal Speech In-Painting
Elyaderani, Mahsa Kadkhodaei
Shirani, Shahram
INTERSPEECH 2023, 2023, : 829 - 833
[42] Abstractive Text Summarization: Enhancing Sequence-to-Sequence Models Using Word Sense Disambiguation and Semantic Content Generalization
Kouris, Panagiotis
Alexandridis, Georgios
Stafylopatis, Andreas
COMPUTATIONAL LINGUISTICS, 2021, 47 (04) : 813 - 859
[43] Generating Natural Answers on Knowledge Bases and Text by Sequence-to-Sequence Learning
Ye, Zhihao
Cai, Ruichu
Liao, Zhaohui
Hao, Zhifeng
Li, Jinfen
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 447 - 455
[44] IMPROVING SEQUENCE-TO-SEQUENCE VOICE CONVERSION BY ADDING TEXT-SUPERVISION
Zhang, Jing-Xuan
Ling, Zhen-Hua
Jiang, Yuan
Liu, Li-Juan
Liang, Chen
Dai, Li-Rong
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6785 - 6789
[45] Turkish abstractive text summarization using pretrained sequence-to-sequence models
Baykara, Batuhan
Gungor, Tunga
NATURAL LANGUAGE ENGINEERING, 2023, 29 (05) : 1275 - 1304
[46] Enhancing the Quality of Nepali Text-to-Speech Systems
Ghimire, Rupak Raj
Bal, Bal Krishna
CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
[47] Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models
Watson, Daniel
Zalmout, Nasser
Habash, Nizar
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 837 - 843
[48] Hierarchical Sequence-to-Sequence Model for Multi-Label Text Classification
Yang, Zhenyu
Liu, Guojing
IEEE ACCESS, 2019, 7 : 153012 - 153020
[49] Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yoruba Language Text
Orife, Iroro Fred Onome
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2848 - 2852
[50] Denoising based Sequence-to-Sequence Pre-training for Text Generation
Wang, Liang
Zhao, Wei
Jia, Ruoyu
Li, Sujian
Liu, Jingming
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4003 - 4015

← 1 2 3 4 5 →