PROSODIC REPRESENTATION LEARNING AND CONTEXTUAL SAMPLING FOR NEURAL TEXT-TO-SPEECH

被引：10

作者：

Karlapati, Sri ^{[1
]}

Abbas, Ammar ^{[1
]}

Hodari, Zack ^{[2
]}

Moinet, Alexis ^{[1
]}

Joly, Arnaud ^{[1
]}

Karanasou, Penny ^{[1
]}

Drugman, Thomas ^{[1
]}

机构：

[1] Amazon Res, Cambridge, England

[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

TTS; prosody modelling; contextual prosody;

D O I：

10.1109/ICASSP39728.2021.9413696

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis with contextually appropriate prosody. In Stage I, we learn a prosodic distribution at the sentence level from mel-spectrograms available during training. In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text. To do this, we use BERT on text, and graph-attention networks on parse trees extracted from text. We show a statistically significant relative improvement of 13.2% in naturalness over a strong baseline when compared to recordings. We also conduct an ablation study on variations of our sampling technique, and show a statistically significant improvement over the baseline in each case.

引用

页码：6573 / 6577

页数：5

共 50 条

[41] Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech
Zhang, Guangyan
Merritt, Thomas
Ribeiro, Manuel Sam
Tura-Vecino, Biel
Yanagisawa, Kayoko
Pokora, Kamil
Ezzerg, Abdelhamid
Cygert, Sebastian
Abbas, Ammar
Bilinski, Piotr
Barra-Chicote, Roberto
Korzekwa, Daniel
Lorenzo-Trueba, Jaime
INTERSPEECH 2023, 2023, : 27 - 31
[42] A Preliminary Study on Wav2Vec 2.0 Embeddings for Text-to-Speech
Lim, Yohan
Kim, Namhyeong
Yun, Seung
Kim, Hun
Lee, Seung-Ik
12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 343 - 347
[43] Spatial Speaker: 3D Java']Java Text-to-Speech Converter
Sodnik, Jaka
Tomazic, Saso
WCECS 2009: WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, VOLS I AND II, 2009, : 1306 - 1310
[44] End-to-End Text-To-Speech synthesis for under resourced South African languages
Nthite, Thapelo
Tsoeu, Mohohlo
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 684 - 689
[45] FCL-TACO2: TOWARDS FAST, CONTROLLABLE AND LIGHTWEIGHT TEXT-TO-SPEECH SYNTHESIS
Wang, Disong
Deng, Liqun
Zhang, Yang
Zheng, Nianzu
Yeung, Yu Ting
Chen, Xiao
Liu, Xunying
Meng, Helen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5714 - 5718
[46] Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
Zhan, Haoyue
Zhang, Haitong
Ou, Wenjie
Lin, Yue
INTERSPEECH 2021, 2021, : 1599 - 1603
[47] Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness
Albrecht, Sven
Tamboli, Rewa
Taubert, Stefan
Eibl, Maximilian
Diaeresis, Gunter
Schmied, Josef
2022 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (IEEE CIVEMSA 2022), 2022,
[48] Natural Text-to-Speech Synthesis by Conditioning Spectrogram Predictions from Transformer Network on WaveGlow Vocoder
Sanjay, G.
Sooraj, K. C.
Mishra, Deepak
2020 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2020), 2020, : 255 - 259
[49] Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content
Cambre, Julia
Colnago, Jessica
Maddock, Jim
Tsai, Janice
Kaye, Jofish
PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20), 2020,
[50] Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis
Xin, Detai
Saito, Yuki
Takamichi, Shinnosuke
Koriyama, Tomoki
Saruwatari, Hiroshi
INTERSPEECH 2021, 2021, : 1614 - 1618

← 1 2 3 4 5 →