MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Rosenberg, Andrew [1 ]
Fernandez, Raul [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
prosody prediction; speech synthesis; low resources;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
引用
收藏
页码:5114 / 5118
页数:5
相关论文
共 50 条
  • [31] Multiple-prosody speech databases and their effectiveness in high-quality speech synthesis at arbitrary rates
    Masuda, T
    Toda, T
    Kawanami, H
    Saruwatari, H
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2005, 88 (09): : 38 - 47
  • [32] Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
    Pan, Shifeng
    He, Lei
    INTERSPEECH 2021, 2021, : 4678 - 4682
  • [33] Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
    Zeng, Zhen
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4422 - 4426
  • [34] Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception
    Szekely, Eva
    Gustafson, Joakim
    Torre, Ilaria
    INTERSPEECH 2023, 2023, : 1234 - 1238
  • [35] Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
    Secujski, Milan
    Pekar, Darko
    Jakovljevic, Niksa
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3164 - +
  • [36] Using Automatic Stress Extraction from Audio for Improved Prosody Modelling in Speech Synthesis
    Szaszak, Gyorgy
    Beke, Andras
    Olaszy, Gabor
    Toth, Balint Pal
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2227 - 2231
  • [37] Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare
    Kim, Minho
    Jung, Youngim
    Kwon, Hyuk-Chul
    ELECTRONICS, 2021, 10 (19)
  • [38] ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS
    Lee, Younggun
    Kim, Taesu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5911 - 5915
  • [39] EmoSRE: Emotion prediction based speech synthesis and refined speech recognition using large language model and prosody encoding
    Akhouri, Shivam
    Balasundaram, Ananthakrishnan
    CURRENT PSYCHOLOGY, 2025, : 7250 - 7262
  • [40] PROSODIC MODELING IN SWEDISH SPEECH SYNTHESIS
    BRUCE, G
    GRANSTROM, B
    SPEECH COMMUNICATION, 1993, 13 (1-2) : 63 - 73