MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Rosenberg, Andrew [1 ]
Fernandez, Raul [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
prosody prediction; speech synthesis; low resources;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
引用
收藏
页码:5114 / 5118
页数:5
相关论文
共 50 条
  • [1] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
  • [2] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
    Jia, Huibin
    Tao, Jianhua
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
  • [3] Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
    Jiang, Yuepeng
    Li, Tao
    Yang, Fengyu
    Xie, Lei
    Menge, Meng
    Wang, Yujun
    INTERSPEECH 2024, 2024, : 2300 - 2304
  • [4] Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis
    Peng, Yukun
    Ling, Zhenhua
    INTERSPEECH 2022, 2022, : 4257 - 4261
  • [5] Fine-grained prosody modeling in neural speech synthesis using ToBI representation
    Zou, Yuxiang
    Liu, Shichao
    Yin, Xiang
    Lin, Haopeng
    Wang, Chunfeng
    Zhang, Haoyu
    Ma, Zejun
    INTERSPEECH 2021, 2021, : 3146 - 3150
  • [6] Discourse Prosody and Its Application to Speech Synthesis
    Hu, Na
    Shao, Pengfei
    Zu, Yiqing
    Wang, Zuyan
    Huang, Wei
    Wang, Shijin
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [7] Prosody modelling of Spanish for expressive speech synthesis
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
  • [8] Measuring the Effect of Reverberation on Statistical Parametric Speech Synthesis
    Coto-Jimenez, Marvin
    HIGH PERFORMANCE COMPUTING, CARLA 2019, 2020, 1087 : 369 - 382
  • [9] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    INTERSPEECH 2022, 2022, : 3388 - 3392
  • [10] Automatic Emphasis Labeling for Emotional Speech by Measuring Prosody Generation Error
    Xu, Jun
    Cai, Lian-Hong
    EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, 5754 : 177 - 186