A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis

被引:0
作者
Tuerk, Oytun [1 ]
Schroeder, Marc [1 ]
机构
[1] DFKI GmbH, Language Technol Lab, Saarbrucken, Germany
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
voice quality transformation; voice conversion; emotional speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus being on spectral envelope conversion for transforming the overall voice quality. Various improvements over the state-of-the-art weighted codebook mapping and GMM based voice conversion frameworks are employed resulting in three algorithms. Objective evaluation results show that all three methods result in comparable reduction in objective distance to target expressive ITS outputs whereas weighted frame mapping and GMM based transformations were perceived slightly better than the weighted codebook mapping outputs in generating the target expressive style in a listening test.
引用
收藏
页码:2282 / 2285
页数:4
相关论文
共 15 条
  • [1] ABE M, 1988, P IEEE ICASSP, P565
  • [2] [Anonymous], 2003, 8 EUR C SPEECH COMM
  • [3] Speaker Transformation Algorithm using Segmental Codebooks (STASC)
    Arslan, LM
    [J]. SPEECH COMMUNICATION, 1999, 28 (03) : 211 - 226
  • [4] HUNECKE A, 2007, THESIS U SAARLANDES
  • [5] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
  • [6] Kang YG, 2005, LECT NOTES COMPUT SC, V3784, P303
  • [7] KAWANAMI H, 2003, P EUR, P2401
  • [8] MESHABI L, 2007, P INT ANTW BELG, P1989
  • [9] PITCH-SYNCHRONOUS WAVE-FORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES
    MOULINES, E
    CHARPENTIER, F
    [J]. SPEECH COMMUNICATION, 1990, 9 (5-6) : 453 - 467
  • [10] PAMMI S, 2008, INTERSPEECH UNPUB