Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [1] ProZed: A speech prosody analysis-by-synthesis tool for linguists
    Hirst, Daniel
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 15 - 18
  • [2] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
    Gong, Cheng
    Wang, Longbiao
    Ling, Zhenhua
    Guo, Shaotong
    Zhang, Ju
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
  • [3] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
    Ding, Chuang
    Xie, Lei
    Yan, Jie
    Zhang, Weini
    Liu, Yang
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
  • [4] Discourse Prosody and Its Application to Speech Synthesis
    Hu, Na
    Shao, Pengfei
    Zu, Yiqing
    Wang, Zuyan
    Huang, Wei
    Wang, Shijin
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] Prosody modelling of Spanish for expressive speech synthesis
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
  • [6] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    INTERSPEECH 2022, 2022, : 3388 - 3392
  • [7] GRAPHPB: GRAPHICAL REPRESENTATIONS OF PROSODY BOUNDARY IN SPEECH SYNTHESIS
    Sun, Aolan
    Wang, Jianzong
    Cheng, Ning
    Peng, Huayi
    Zeng, Zhen
    Kong, Lingwei
    Xiao, Jing
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 438 - 445
  • [8] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
    Zhu, Jing
    Yu, Yibiao
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
  • [9] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [10] ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS
    Lai, Cheng-I Jeff
    Cooper, Erica
    Zhang, Yang
    Chang, Shiyu
    Qian, Kaizhi
    Liao, Yi-Lun
    Chuang, Yung-Sung
    Liu, Alexander H.
    Yamagishi, Junichi
    Cox, David
    Glass, James
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8447 - 8451