A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis

被引:0
作者
Ribeiro, Manuel Sam [1 ]
Yamagishi, Junichi [1 ,2 ]
Clark, Robert A. J. [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
[2] Natl Inst Informat, Tokyo, Japan
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
speech synthesis; prosody; f0; modeling; continuous wavelet transform; perceptual experiments;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Continuous Wavelet Transform (CWT) has been recently proposed to model f0 in the context of speech synthesis. It was shown that systems using signal decomposition with the CWT tend to outperform systems that model the signal directly. Theft) signal is typically decomposed into various scales of differing frequency. In these experiments, we reconstruct f0 with selected frequencies and ask native listeners to judge the naturalness of synthesized utterances with respect to natural speech. Results indicate that HMM-generated f0 is comparable to the CWT low frequencies, suggesting it mostly generates utterances with neutral intonation. Middle frequencies achieve very high levels of naturalness, while very high frequencies am mostly noise.
引用
收藏
页码:1586 / 1590
页数:5
相关论文
共 15 条
  • [1] [Anonymous], 2013, TRASP 2013 TOOLS RES
  • [2] [Anonymous], 2013, 8 ISCA WORKSH SPEECH
  • [3] [Anonymous], ISCA WORKSH SPEECH S
  • [4] Borg I, 2007, Modern Multidimensional Scaling: Theory and Applications
  • [5] BRAUNSCHWEILER N, 2010, INTERSPEECH, P2222
  • [6] Braunschweiler N., 2011, INTERSPEECH, P1821
  • [7] CERNAK M, 2013, AC SPEECH SIGN PROC, P8140
  • [8] Cole J., 2010, LAB PHONOLOGY, V1, P425, DOI [DOI 10.1515/LABPHON.2010.022, 10.1515/labphon.2010.022]
  • [9] Farouk M.H., 2014, Application of wavelets in speech processing
  • [10] Henter G. E., 2014, MEASURING P IN PRESS