MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引:0
作者
Rosenberg, Andrew [1 ]
Fernandez, Raul [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
prosody prediction; speech synthesis; low resources;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
引用
收藏
页码:5114 / 5118
页数:5
相关论文
共 50 条
  • [41] EXPERIMENTS WITH VOICE MODELING IN SPEECH SYNTHESIS
    CARLSON, R
    GRANSTROM, B
    KARLSSON, I
    SPEECH COMMUNICATION, 1991, 10 (5-6) : 481 - 489
  • [42] Modeling pause for the synthesis of Kazakh speech
    Kaliyev, Arman
    Rybin, Sergey, V
    Matveev, Yuri N.
    Kaziyeva, Nazym
    Burambayeva, Nursaule
    ICEMIS'18: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON ENGINEERING AND MIS, 2018,
  • [43] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
  • [44] Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
    Nishimura, Yuto
    Saito, Yuki
    Takamichi, Shinnosuke
    Tachibana, Kentaro
    Saruwatari, Hiroshi
    INTERSPEECH 2022, 2022, : 3373 - 3377
  • [45] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
    Ding, Chuang
    Xie, Lei
    Yan, Jie
    Zhang, Weini
    Liu, Yang
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
  • [46] Perceptual Relevance of Pitch Contours of Mandarin Tones and its Efficacy in Prosody Generation of Speech Synthesis
    Chen, Shi-Han
    Kuo, Chih-Chung
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2792 - 2795
  • [47] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
    Gong, Cheng
    Wang, Longbiao
    Ling, Zhenhua
    Guo, Shaotong
    Zhang, Ju
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
  • [48] Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
    Hirose, Keikichi
    Ochi, Keiko
    Mihara, Ryusuke
    Hashimoto, Hiroya
    Saito, Daisuke
    Minematsu, Nobuaki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2804 - +
  • [49] Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis
    Zhang, Ya-Jie
    Zhang, Chao
    Song, Wei
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2812 - 2823
  • [50] Modeling the Creaky Excitation for Parametric Speech Synthesis
    Drugman, Thomas
    Kane, John
    Gobl, Christer
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1422 - 1425