MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Rosenberg, Andrew ^{[1
]}

Fernandez, Raul ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

机构：

[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

prosody prediction; speech synthesis; low resources;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.

引用

页码：5114 / 5118

页数：5

共 50 条

[31] Multiple-prosody speech databases and their effectiveness in high-quality speech synthesis at arbitrary rates
Masuda, T
Toda, T
Kawanami, H
Saruwatari, H
Shikano, K
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (09): : 38 - 47
[32] Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Pan, Shifeng
He, Lei
INTERSPEECH 2021, 2021, : 4678 - 4682
[33] Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
Zeng, Zhen
Wang, Jianzong
Cheng, Ning
Xiao, Jing
INTERSPEECH 2020, 2020, : 4422 - 4426
[34] Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception
Szekely, Eva
Gustafson, Joakim
Torre, Ilaria
INTERSPEECH 2023, 2023, : 1234 - 1238
[35] Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
Secujski, Milan
Pekar, Darko
Jakovljevic, Niksa
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3164 - +
[36] Using Automatic Stress Extraction from Audio for Improved Prosody Modelling in Speech Synthesis
Szaszak, Gyorgy
Beke, Andras
Olaszy, Gabor
Toth, Balint Pal
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2227 - 2231
[37] Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare
Kim, Minho
Jung, Youngim
Kwon, Hyuk-Chul
ELECTRONICS, 2021, 10 (19)
[38] ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS
Lee, Younggun
Kim, Taesu
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5911 - 5915
[39] EmoSRE: Emotion prediction based speech synthesis and refined speech recognition using large language model and prosody encoding
Akhouri, Shivam
Balasundaram, Ananthakrishnan
CURRENT PSYCHOLOGY, 2025, : 7250 - 7262
[40] PROSODIC MODELING IN SWEDISH SPEECH SYNTHESIS
BRUCE, G
GRANSTROM, B
SPEECH COMMUNICATION, 1993, 13 (1-2) : 63 - 73

← 1 2 3 4 5 →