Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis

被引:90
作者
Erro, Daniel [1 ,2 ]
Sainz, Inaki [1 ]
Navas, Eva [1 ]
Hernaez, Inma [1 ]
机构
[1] Univ Basque Country UPV EHU, AhoLab Signal Proc Lab, Bilbao 48013, Spain
[2] Basque Fdn Sci, IKERBASQUE, Bilbao 48011, Spain
关键词
Harmonics plus noise model; statistical parametric speech synthesis; vocoder; voice transformation;
D O I
10.1109/JSTSP.2013.2283471
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article explores the potential of the harmonics plus noise model of speech in the development of a high-quality vocoder applicable in statistical frameworks, particularly in modern speech synthesizers. It presents an extensive explanation of all the different alternatives considered during the design of the HNM-based vocoder, together with the corresponding objective and subjective experiments, and a careful description of its implementation details. Three aspects of the analysis have been investigated: refinement of the pitch estimation using quasi-harmonic analysis, study and comparison of several spectral envelope analysis procedures, and strategies to analyze and model the maximum voiced frequency. The performance of the resulting vocoder is shown to be similar to that of state-of-the-art vocoders in synthesis tasks.
引用
收藏
页码:184 / 194
页数:11
相关论文
共 41 条
[1]   Low bit-rate speech coding based on an improved sinusoidal model [J].
Ahmadi, S ;
Spanias, AS .
SPEECH COMMUNICATION, 2001, 34 (04) :369-390
[2]  
[Anonymous], 2010, P FALA RTTH VIG SPAI
[3]  
[Anonymous], 1999, P EUROSPEECH
[4]  
[Anonymous], P INTERSPEECH
[5]  
Banos E., 2008, P 5 JORN TECN HABL, P145
[6]  
Boersma P., 1993, IFA P, V17, P97, DOI DOI 10.1371/JOURNAL.PONE.0069107
[7]  
Cabral JP, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1829
[8]  
Cappe O., 1995, 1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (Cat. No.95TH8144), P213, DOI 10.1109/ASPAA.1995.482993
[9]  
Degottex G., 2012, P INTERSPEECH
[10]  
Drugman T., 2009, Proc. Interspeech, P1779