Modeling the Creaky Excitation for Parametric Speech Synthesis

被引:0
作者
Drugman, Thomas [1 ]
Kane, John [1 ]
Gobl, Christer [1 ]
机构
[1] Univ Mons, TCTS Lab, B-7000 Mons, Belgium
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
Voice quality; speech synthesis; creak; vocal fry;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to produce natural sounding output, corpus-based speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice quality frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristics than modal excitations and is, hence, not suitably modelled by standard vocoders. This study presents an analysis of the creaky excitation which is used to derive an extension of the Deterministic plus Stochastic Model of the residual signal. This proposed model is designed to appropriately model creaky voice and is integrated into a vocoder for parametric speech synthesis. Copy-synthesis versions of short speech segments containing creaky voice were used in a subjective listening test which revealed clearly better rendering of the voice quality than a standard vocoder.
引用
收藏
页码:1422 / 1425
页数:4
相关论文
共 14 条
[1]  
[Anonymous], P ICASSP
[2]  
[Anonymous], P FON 2006
[3]   Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers [J].
Blomgren, M ;
Chen, Y ;
Ng, ML ;
Gilbert, HR .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1998, 103 (05) :2649-2658
[4]  
Cabral J. P., 2011, P ICASSP, P4704
[5]  
Drugman T., 2012, INTERSPEECH IN PRESS
[6]   Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review [J].
Drugman, Thomas ;
Thomas, Mark ;
Gudnason, Jon ;
Naylor, Patrick ;
Dutoit, Thierry .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :994-1006
[7]   The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications [J].
Drugman, Thomas ;
Dutoit, Thierry .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :968-981
[8]   The role of voice quality in communicating emotion, mood and attitude [J].
Gobl, C ;
Ní Chasaide, A .
SPEECH COMMUNICATION, 2003, 40 (1-2) :189-212
[9]   A method for automatic detection of vocal fry [J].
Ishi, Carlos Toshinori ;
Sakakibara, Ken-Ichi ;
Ishiguro, Hiroshi ;
Hagita, Norihiro .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :47-56
[10]  
Kane J., SPEECH COMMUNI UNPUB