Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis

被引:0
作者
Maia, Ranniery [1 ]
Gales, Mark J. F. [1 ]
Stylianou, Yannis [1 ]
Akamine, Masami [2 ]
机构
[1] Cambridge Res Lab, Toshiba Res Europe Ltd, Cambridge, England
[2] Toshiba Co Ltd, Corp Res & Dev, Kawasaki, Kanagawa, Japan
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Speech synthesis; statistical parametric speech synthesis; complex cepstrum; cepstral analysis; DISCRETE-FOURIER-TRANSFORM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach for complex cepstrum analysis based on the minimum mean squared error criterion, and describes its application to statistical parametric speech synthesis. The proposed method alleviates some of the issues associated with conventional complex cepstrum analysis, such as choice of the window, phase unwrapping, and the need for accurate pitch marks. Given initial estimates of warped complex cepstra and respective analysis instants, the method iteratively optimizes the complex cepstrum on a warped quefrency domain by minimizing the mean squared error between the natural and the reconstructed speech waveforms. When applied to statistical parametric speech synthesis, the optimized complex cepstrum results in better performance in terms of synthesized speech quality, specially for emotional databases, when compared with the complex cepstrum calculated through conventional methods.
引用
收藏
页码:2335 / 2339
页数:5
相关论文
共 15 条
[1]  
[Anonymous], 1999, SPRINGER SCI
[2]  
Bagchi S, 1996, IEEE T CIRCUITS-II, V43, P422, DOI 10.1109/82.502315
[3]  
Chu W.C., 2003, SPEECH CODING ALGORI
[4]  
Deller J.R., 2000, Discrete-time Processing of Speech Signals
[5]   Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation [J].
Drugman, Thomas ;
Bozkurt, Bans ;
Dutoit, Thierry .
SPEECH COMMUNICATION, 2011, 53 (06) :855-866
[6]  
Maia R., 2013, P ICASSP IN PRESS
[7]   Complex cepstrum for statistical parametric speech synthesis [J].
Maia, Ranniery ;
Akamine, Masami ;
Gales, Mark J. F. .
SPEECH COMMUNICATION, 2013, 55 (05) :606-618
[8]  
Maia R, 2012, INT CONF ACOUST SPEE, P4581, DOI 10.1109/ICASSP.2012.6288938
[9]   Warped discrete-Fourier transform: Theory and applications [J].
Makur, A ;
Mitra, SK .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 2001, 48 (09) :1086-1093
[10]   DISCRETE REPRESENTATION OF SIGNALS [J].
OPPENHEIM, AV ;
JOHNSON, DH .
PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (06) :681-+