Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [11] MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS
    Rosenberg, Andrew
    Fernandez, Raul
    Ramabhadran, Bhuvana
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5114 - 5118
  • [12] Prosody evaluation for embedded slovene speech-synthesis systems
    Mihelic, France
    Vesnicer, Bostjan
    Zibert, Janez
    Noeth, Elmar
    INFORMACIJE MIDEM-JOURNAL OF MICROELECTRONICS ELECTRONIC COMPONENTS AND MATERIALS, 2007, 37 (03): : 176 - 181
  • [13] Feedback Loop for Prosody Prediction in Concatenative Speech Synthesis.
    Latorre, Javier
    Gracia, Sergio
    Akamine, Masami
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2027 - 2030
  • [14] Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
    Windmann, Andreas
    Jauk, Igor
    Tamburini, Fabio
    Wagner, Petra
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 332 - +
  • [15] Word-level Text Markup for Prosody Control in Speech Synthesis
    Korotkova, Yuliya
    Kalinovskiy, Ilya
    Vakhrusheva, Tatiana
    INTERSPEECH 2024, 2024, : 2280 - 2284
  • [16] DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training
    Oh, Hyung-Seok
    Lee, Sang-Hoon
    Lee, Seong-Whan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2654 - 2666
  • [17] INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
    Cornille, Tobias
    Wang, Fengna
    Bekker, Jessa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8312 - 8316
  • [18] Improving Speech Synthesis by Automatic Speech Recognition and Speech Discriminator
    Huang, Li-Yu
    Chen, Chia-Ping
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (01) : 189 - 200
  • [19] A modular holistic approach to prosody modelling for Standard Yoruba speech synthesis
    Qdejobi, Odetunji A.
    Wong, Shun Ha Sylvia
    Beaumont, Anthony J.
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01) : 39 - 68
  • [20] Study and Implementation of Prosody Manipulation Method For Indonesian Speech Synthesis System
    Prini, Salita Ulitia
    Prihatmanto, Ary Setijadi
    Jatmiko, Didit Andri
    2018 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2018, : 121 - 126