MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Rosenberg, Andrew [1 ]
Fernandez, Raul [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
prosody prediction; speech synthesis; low resources;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
引用
收藏
页码:5114 / 5118
页数:5
相关论文
共 50 条
  • [21] Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
    Windmann, Andreas
    Jauk, Igor
    Tamburini, Fabio
    Wagner, Petra
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 332 - +
  • [22] Word-level Text Markup for Prosody Control in Speech Synthesis
    Korotkova, Yuliya
    Kalinovskiy, Ilya
    Vakhrusheva, Tatiana
    INTERSPEECH 2024, 2024, : 2280 - 2284
  • [23] DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training
    Oh, Hyung-Seok
    Lee, Sang-Hoon
    Lee, Seong-Whan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2654 - 2666
  • [24] INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
    Cornille, Tobias
    Wang, Fengna
    Bekker, Jessa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8312 - 8316
  • [25] A modular holistic approach to prosody modelling for Standard Yoruba speech synthesis
    Qdejobi, Odetunji A.
    Wong, Shun Ha Sylvia
    Beaumont, Anthony J.
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01) : 39 - 68
  • [26] Study and Implementation of Prosody Manipulation Method For Indonesian Speech Synthesis System
    Prini, Salita Ulitia
    Prihatmanto, Ary Setijadi
    Jatmiko, Didit Andri
    2018 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2018, : 121 - 126
  • [27] UNSUPERVISED WORD-LEVEL PROSODY TAGGING FOR CONTROLLABLE SPEECH SYNTHESIS
    Guo, Yiwei
    Du, Chenpeng
    Yu, Kai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7597 - 7601
  • [28] Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis
    Vainio, Martti
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 37 - 54
  • [29] Technical and Phonetic Aspects of Speech Quality Assessment: The Case of Prosody Synthesis
    Tuckova, Jana
    Holub, Jan
    Dubeda, Tomas
    CROSS-MODAL ANALYSIS OF SPEECH, GESTURES, GAZE AND FACIAL EXPRESSIONS, 2009, 5641 : 126 - +
  • [30] Eye Tracking for the Online Evaluation of Prosody in Speech Synthesis: Not So Fast!
    White, Michael
    Rajkumar, Rajakrishnan
    Ito, Kiwako
    Speer, Shari R.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2491 - 2494