MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Rosenberg, Andrew ^{[1
]}

Fernandez, Raul ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

机构：

[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

prosody prediction; speech synthesis; low resources;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.

引用

页码：5114 / 5118

页数：5

共 50 条

[41] EXPERIMENTS WITH VOICE MODELING IN SPEECH SYNTHESIS
CARLSON, R
GRANSTROM, B
KARLSSON, I
SPEECH COMMUNICATION, 1991, 10 (5-6) : 481 - 489
[42] Modeling pause for the synthesis of Kazakh speech
Kaliyev, Arman
Rybin, Sergey, V
Matveev, Yuri N.
Kaziyeva, Nazym
Burambayeva, Nursaule
ICEMIS'18: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON ENGINEERING AND MIS, 2018,
[43] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
Raptis, Spyros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Tsiakoulis, Pirros
2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
[44] Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Nishimura, Yuto
Saito, Yuki
Takamichi, Shinnosuke
Tachibana, Kentaro
Saruwatari, Hiroshi
INTERSPEECH 2022, 2022, : 3373 - 3377
[45] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
Ding, Chuang
Xie, Lei
Yan, Jie
Zhang, Weini
Liu, Yang
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
[46] Perceptual Relevance of Pitch Contours of Mandarin Tones and its Efficacy in Prosody Generation of Speech Synthesis
Chen, Shi-Han
Kuo, Chih-Chung
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2792 - 2795
[47] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
Gong, Cheng
Wang, Longbiao
Ling, Zhenhua
Guo, Shaotong
Zhang, Ju
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
[48] Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
Hirose, Keikichi
Ochi, Keiko
Mihara, Ryusuke
Hashimoto, Hiroya
Saito, Daisuke
Minematsu, Nobuaki
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2804 - +
[49] Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis
Zhang, Ya-Jie
Zhang, Chao
Song, Wei
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2812 - 2823
[50] Modeling the Creaky Excitation for Parametric Speech Synthesis
Drugman, Thomas
Kane, John
Gobl, Christer
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1422 - 1425

← 1 2 3 4 5 →