HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling

被引：0

作者：

Maeno, Yu ^{[1
]}

Nose, Takashi ^{[1
]}

Kobayashi, Takao ^{[1
]}

Ijima, Yusuke ^{[2
]}

Nakajima, Hideharu ^{[2
]}

Mizuno, Hideyuki ^{[2
]}

Yoshioka, Osamu ^{[2
]}

机构：

[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Tokyo, Japan

[2] NTT Corp, NTT Cyber Space Labs, Miami, FL USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

HMM-based speech synthesis; expressive speech; emphasis expression; unsupervised labeling; F0; generation; EMPHASIS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes an approach to HMM-based expressive speech synthesis which does not require any supervised labeling process for emphasis context. We use appealing-style speech whose sentences were taken from real domains. To reduce the cost for labeling speech data with an emphasis context for the model training, we propose an unsupervised labeling technique of the emphasis context based on the difference between original and generated F0 patterns of training sentences. Although the criterion for the emphasis labeling is quite simple, subjective evaluation results reveal that the unsupervised labeling is comparable to the labeling conducted carefully by a human in terms of speech naturalness and emphasis reproducibility.

引用

页码：1860 / +

页数：2

共 13 条

[1]

[Anonymous], 1999, P EUROSPEECH

[2]

Badino L., 2009, P INT BRIGHT UK, P520

[3]

Brenier J.M., 2005, Proceedings of Eurospeech, Lisbon, Portugal, P3297

[4] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[5] The perception of intonational emphasis: continuous or categorical? [J].

Ladd, DR ;

Morton, R .

JOURNAL OF PHONETICS, 1997, 25 (03) :313-342

[6] Emphasized Speech Synthesis Based on Hidden Markov Models [J].

Morizane, Kumiko ;

Nakamura, Keigo ;

Toda, Tomoki ;

Saruwatari, Hiroshi ;

Shikano, Kiyohiro .

ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, :76-81

[7]

Nakajima H., 2010, P OR COCOSDA, P30

[8]

Shinoda K., 2000, Journal of the Acoustical Society of Japan (E), V21, P79, DOI 10.1250/ast.21.79

[9]

Xu J, 2009, LECT NOTES COMPUT SC, V5754, P177

[10] Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis [J].

Yamagishi, J ;

Onishi, K ;

Masuko, T ;

Kobayashi, T .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03) :502-509

← 1 2 →