Two-Stage Prosody Prediction for Emotional Text-to-Speech Synthesis

被引：0

作者：

Tang, Hao ^{[1
]}

Zhou, Xi ^{[1
]}

Odisio, Matthias ^{[1
]}

Hasegawa-Johnson, Mark ^{[1
]}

Huang, Thomas S. ^{[1
]}

机构：

[1] Univ Illinois, Urbana, IL USA

来源：

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年

关键词：

TTS; speech synthesis; prosody prediction; CART; dynamic programming;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we adopt a difference approach to prosody prediction for emotional text-to-speech synthesis, where the prosodic variations between emotional and neutral speech are decomposed into the global and local prosodic variations and predicted using a two-stage model. The global prosodic variations are modeled by the means and standard deviations of the prosodic parameters, while the local prosodic variations are modeled by the classification and regression tree (CART) and dynamic programming. The proposed two-stage prosody prediction model has been successfully implemented as a prosodic module in a Festival-MBROLA architecture based emotional text-to-speech synthesis system, which is able to synthesize highly intelligible, natural and expressive speech.

引用

页码：2138 / 2141

页数：4

共 17 条

[1]

[Anonymous], 1997, Multilingual Text-to-Speech Synthesis": The Bell Labs Approach

[2]

Bellman R., 2003, DYNAMIC PROGRAMMING

[3]

BURKHARDT F, 2005, P INTERSPEECH 2005 L, P509

[4]

Cahn J. E., 1989, THESIS MIT

[5]

Duda RO, 2006, PATTERN CLASSIFICATI

[6]

Dutoit Thierry., 1997, INTRO TEXT TO SPEECH, DOI 10.1007/978-94-011-5730-8

[7]

Eide E., 2004, P 5 ISCA SPEECH SYNT

[8]

Li B., 1984, BIOMETRICS, V40, P358, DOI DOI 10.2307/2530946

[9]

MURRAY IR, 1989, THESIS U DUNDEE UK

[10]

Narayanan S. S., 2005, Text to speech synthesis: new paradigms and advances

← 1 2 →