Parameterization of Vocal Fry in HMM-Based Speech Synthesis

被引:0
作者
Silen, Hanna [1 ]
Helander, Elina [1 ]
Nurminen, Jani [2 ]
Gabbouj, Moncef [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, Tampere, Finland
[2] Nokia Devices R&D, Tampere, Finland
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
基金
芬兰科学院;
关键词
speech synthesis; hidden Markov models; vocal fry; mixed excitation; STRAIGHT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMM-based speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for modal voices and may not produce high quality with other voice types. Fortunately, due to the flexibility of STRAIGHT, different F0 and aperiodicity measures can be used in the synthesis without any inherent degradations in speech quality. We have replaced the STRAIGHT excitation with a representation based on a robust F0 measure and a carefully determined two-band voicing. According to our analysis-synthesis experiments, the new parameterization can improve the speech quality. In HMM-based speech synthesis, the quality is significantly improved especially due to the better modeling of vocal fry.
引用
收藏
页码:1735 / +
页数:2
相关论文
共 15 条
[1]  
[Anonymous], 1999, P EUROSPEECH
[2]  
[Anonymous], 2005, P INT 2005 LISB PORT
[3]  
[Anonymous], 2005, P INTERSPEECH 2005 L
[4]   VOCAL QUALITY FACTORS - ANALYSIS, SYNTHESIS, AND PERCEPTION [J].
CHILDERS, DG ;
LEE, CK .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (05) :2394-2410
[5]   PERCEPTUAL STUDY OF VOCAL FRY [J].
HOLLIEN, H ;
WENDAHL, RW .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1968, 43 (03) :506-&
[6]  
IIVONEN A, 2004, NORDIC PROSODY, P137
[7]   Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].
Kawahara, H ;
Masuda-Katsuse, I ;
de Cheveigné, A .
SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207
[8]  
KAWAHARA H, 1999, FIXED POINT ANAL FRE, P2781
[9]   STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds [J].
Kawahara, Hideki .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2006, 27 (06) :349-353
[10]  
Kim S. J., 2006, IEEE T CONSUMER ELEC, V52