Perceptual distortion analysis and quality estimation of prosodymodified speech for TD-PSOLA

被引:0
作者
Chen, Shi-Han [1 ]
Chen, Shun-Ju [1 ]
Kuo, Chih-Chung [1 ]
机构
[1] ITRI, ICL, Ctr Adv Technol, Hsinchu, Taiwan
来源
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 | 2006年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
TD-PSOLA is one of the most widely used prosodic modification techniques. However, perceptible distortions are introduced occasionally and how TD-PSOLA affects speech quality has not been fully understood and controlled. In this paper, we present a quality estimation method before performing modification. By exploiting relationship between prosodic modifications and subjective scores, 27 distance measures are proposed and respective performances are given and compared. Extensive search is used to find every possible combination among these measures, and the best correlation between the predicted and subjective scores is 87.6%, which can be obtained by linear regression of 4 proposed distance measures. The proposed method does not require synthesizing target and can be used both in online unit selection and off-line corpus design of TTS systems.
引用
收藏
页码:861 / 864
页数:4
相关论文
共 7 条
  • [1] CHU, P EUROSPEECH 01
  • [2] An auditory-based distortion measure with application to concatenative speech synthesis
    Hansen, JHL
    Chappell, DT
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 489 - 495
  • [3] Reducing audible spectral discontinuities
    Klabbers, E
    Veldhuis, R
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 39 - 51
  • [4] KLABBERS E, 2003, P EUR 03 GEN SWITZ, P317
  • [5] Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-formant stimuli
    Kortekaas, RWL
    Kohlrausch, A
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (04) : 2202 - 2213
  • [6] PITCH-SYNCHRONOUS WAVE-FORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES
    MOULINES, E
    CHARPENTIER, F
    [J]. SPEECH COMMUNICATION, 1990, 9 (5-6) : 453 - 467
  • [7] STYLIANOU P, 2001, P ICASSP 01 SALT LAK