Modeling the Creaky Excitation for Parametric Speech Synthesis

被引：0

作者：

Drugman, Thomas ^{[1
]}

Kane, John ^{[1
]}

Gobl, Christer ^{[1
]}

机构：

[1] Univ Mons, TCTS Lab, B-7000 Mons, Belgium

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

Voice quality; speech synthesis; creak; vocal fry;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In order to produce natural sounding output, corpus-based speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice quality frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristics than modal excitations and is, hence, not suitably modelled by standard vocoders. This study presents an analysis of the creaky excitation which is used to derive an extension of the Deterministic plus Stochastic Model of the residual signal. This proposed model is designed to appropriately model creaky voice and is integrated into a vocoder for parametric speech synthesis. Copy-synthesis versions of short speech segments containing creaky voice were used in a subjective listening test which revealed clearly better rendering of the voice quality than a standard vocoder.

引用

页码：1422 / 1425

页数：4

共 14 条

[1]

[Anonymous], P ICASSP

[2]

[Anonymous], P FON 2006

[3] Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers [J].