Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [31] Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
    Pan, Shifeng
    He, Lei
    INTERSPEECH 2021, 2021, : 4678 - 4682
  • [32] Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
    Zeng, Zhen
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4422 - 4426
  • [33] ARTICULATORY FEATURES FOR EXPRESSIVE SPEECH SYNTHESIS
    Black, Alan W.
    Bunnell, H. Timothy
    Dou, Ying
    Muthukumar, Prasanna Kumar
    Metze, Florian
    Perry, Daniel
    Polzehl, Tim
    Prahallad, Kishore
    Steidl, Stefan
    Vaughn, Callie
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4005 - 4008
  • [34] Effectiveness of Speech Mode Adaptation for Improving Dialogue Speech Synthesis
    Kaya, Kazuki
    Mori, Hiroki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (10): : 2064 - 2066
  • [35] Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception
    Szekely, Eva
    Gustafson, Joakim
    Torre, Ilaria
    INTERSPEECH 2023, 2023, : 1234 - 1238
  • [36] Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
    Secujski, Milan
    Pekar, Darko
    Jakovljevic, Niksa
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3164 - +
  • [37] Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis
    Peng, Yukun
    Ling, Zhenhua
    INTERSPEECH 2022, 2022, : 4257 - 4261
  • [38] Using Automatic Stress Extraction from Audio for Improved Prosody Modelling in Speech Synthesis
    Szaszak, Gyorgy
    Beke, Andras
    Olaszy, Gabor
    Toth, Balint Pal
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2227 - 2231
  • [39] Fine-grained prosody modeling in neural speech synthesis using ToBI representation
    Zou, Yuxiang
    Liu, Shichao
    Yin, Xiang
    Lin, Haopeng
    Wang, Chunfeng
    Zhang, Haoyu
    Ma, Zejun
    INTERSPEECH 2021, 2021, : 3146 - 3150
  • [40] Excitation modelling using epoch features for statistical parametric speech synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    COMPUTER SPEECH AND LANGUAGE, 2020, 60