Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [41] Improving Speech Synthesis of Machine Translation Output
    Parlikar, Alok
    Black, Alan W.
    Vogel, Stephan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 194 - 197
  • [42] Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare
    Kim, Minho
    Jung, Youngim
    Kwon, Hyuk-Chul
    ELECTRONICS, 2021, 10 (19)
  • [43] ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS
    Lee, Younggun
    Kim, Taesu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5911 - 5915
  • [44] EmoSRE: Emotion prediction based speech synthesis and refined speech recognition using large language model and prosody encoding
    Akhouri, Shivam
    Balasundaram, Ananthakrishnan
    CURRENT PSYCHOLOGY, 2025, : 7250 - 7262
  • [45] Improving Adaptive Learning Models Using Prosodic Speech Features
    Wilschut, Thomas
    Sense, Florian
    Scharenborg, Odette
    van Rijn, Hedderik
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 255 - 266
  • [46] Areal and Phylogenetic Features for Multilingual Speech Synthesis
    Gutkin, Alexander
    Sproat, Richard
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2078 - 2082
  • [47] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
    Watts, Oliver
    Zhou, Bowen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
  • [48] Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis
    Sakai, Shinsuke
    Kawahara, Tatsuya
    Kawai, Hisashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2006 - 2014
  • [49] Perceptual Relevance of Pitch Contours of Mandarin Tones and its Efficacy in Prosody Generation of Speech Synthesis
    Chen, Shi-Han
    Kuo, Chih-Chung
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2792 - 2795
  • [50] Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
    Hirose, Keikichi
    Ochi, Keiko
    Mihara, Ryusuke
    Hashimoto, Hiroya
    Saito, Daisuke
    Minematsu, Nobuaki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2804 - +