Enhancing Sequence-to-Sequence Text-to-Speech with Morphology

被引：3

作者：

Taylor, Jason ^{[1
]}

Richmond, Korin ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2020 | 2020年

关键词：

Speech Synthesis; Sequence-to-Sequence; Morphology; Pronunciation;

D O I：

10.21437/Interspeech.2020-1547

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Neural sequence-to-sequence (S2S) modelling encodes a single, unified representation for each input sequence. When used for text-to-speech synthesis (TTS), such representations must embed ambiguities between English spelling and pronunciation. For example, in pothole and there the character sequence th sounds different. This can be problematic when predicting pronunciation directly from letters. We posit pronunciation becomes easier to predict when letters are grouped into subword units like morphemes (e.g. a boundary lies between t and h in pothole but not there). Moreover, morphological boundaries can reduce the total number of, and increase the counts of, seen unit subsequences. Accordingly, we test here the effect of augmenting input sequences of letters with morphological boundaries. We find morphological boundaries substantially lower the Word and Phone Error Rates (WER and PER) for a Bi-LSTM performing G2P on one hand, and also increase the naturalness scores of Tacotrons performing TTS in a MUSHRA listening test on the other. The improvements to TTS quality are such that grapheme input augmented with morphological boundaries outperforms phone input without boundaries. Since morphological segmentation may be predicted with high accuracy, we highlight this simple pre-processing step has important potential for S2S modelling in TTS.

引用

页码：1738 / 1742

页数：5

共 50 条

[21] GRAPHTTS: GRAPH-TO-SEQUENCE MODELLING IN NEURAL TEXT-TO-SPEECH
Sun, Aolan
Wang, Jianzong
Cheng, Ning
Peng, Huayi
Zeng, Zhen
Xiao, Jing
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6719 - 6723
[22] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Irie, Kazuki
Prabhavalkar, Rohit
Kannan, Anjuli
Bruguier, Antoine
Rybach, David
Nguyen, Patrick
INTERSPEECH 2019, 2019, : 3800 - 3804
[23] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
Shi, Tian
Keneshloo, Yaser
Ramakrishnan, Naren
Reddy, Chandan K.
ACM/IMS Transactions on Data Science, 2021, 2 (01):
[24] A Sequence-to-Sequence Pronunciation Model for Bangla Speech Synthesis
Ahmad, Arif
Hussain, Mohammed Raihan
Selim, Mohammad Reza
Iqbal, Muhammed Zafar
Rahman, Mohammad Shahidur
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[25] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
Karafiat, Martin
Baskar, Murali Karthick
Watanabe, Shinji
Hori, Takaaki
Wiesner, Matthew
Cernocky, Jan Honza
INTERSPEECH 2019, 2019, : 2220 - 2224
[26] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Yang, Gene-Ping
Tang, Hao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7222 - 7226
[27] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
Shahamiri, Seyed Reza
Lal, Vanshika
Shah, Dhvani
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
[28] Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources
Milintsevich, Kirill
Sirts, Kairit
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3112 - 3122
[29] INVESTIGATION OF AN INPUT SEQUENCE ON THAI NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS
Janyoi, Pongsathon
Thangthai, Ausdang
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 218 - 223
[30] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Weiss, Ron J.
Chorowski, Jan
Jaitly, Navdeep
Wu, Yonghui
Chen, Zhifeng
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629

← 1 2 3 4 5 →