Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引：0

作者：

Lazaridis, Alexandros ^{[1
]}

Cernak, Milos ^{[1
]}

Garner, Philip N. ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

瑞士国家科学基金会;

关键词：

Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;

D O I：

10.21437/Interspeech.2016-258

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.

引用

页码：2298 / 2302

页数：5

共 50 条

[1] ProZed: A speech prosody analysis-by-synthesis tool for linguists
Hirst, Daniel
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 15 - 18
[2] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
Gong, Cheng
Wang, Longbiao
Ling, Zhenhua
Guo, Shaotong
Zhang, Ju
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
[3] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
Ding, Chuang
Xie, Lei
Yan, Jie
Zhang, Weini
Liu, Yang
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
[4] Discourse Prosody and Its Application to Speech Synthesis
Hu, Na
Shao, Pengfei
Zu, Yiqing
Wang, Zuyan
Huang, Wei
Wang, Shijin
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[5] Prosody modelling of Spanish for expressive speech synthesis
Iriondo, Ignasi
Socoro, Joan Claudi
Alias, Francesc
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
[6] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
O'Mahony, Johannah
Lai, Catherine
King, Simon
INTERSPEECH 2022, 2022, : 3388 - 3392
[7] GRAPHPB: GRAPHICAL REPRESENTATIONS OF PROSODY BOUNDARY IN SPEECH SYNTHESIS
Sun, Aolan
Wang, Jianzong
Cheng, Ning
Peng, Huayi
Zeng, Zhen
Kong, Lingwei
Xiao, Jing
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 438 - 445
[8] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
Zhu, Jing
Yu, Yibiao
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
[9] Expressive Prosody for Unit-selection Speech Synthesis
Strom, Volker
Clark, Robert
King, Simon
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
[10] ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS
Lai, Cheng-I Jeff
Cooper, Erica
Zhang, Yang
Chang, Shiyu
Qian, Kaizhi
Liao, Yi-Lun
Chuang, Yung-Sung
Liu, Alexander H.
Yamagishi, Junichi
Cox, David
Glass, James
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8447 - 8451

← 1 2 3 4 5 →