Runtime and Speech Quality Survey of a Voice Conversion Method

被引:0
作者
Jokisch, Oliver [1 ]
Birhanu, Yitagessu [2 ]
Hoffmann, Ruediger [2 ]
机构
[1] Leipzig Univ Telecommun, Inst Commun Engn, Gustav Freytag St 43, D-04277 Leipzig, Germany
[2] Tech Univ Dresden, Chair Syst Theory & Speech Technol, D-01069 Dresden, Germany
来源
2013 IEEE EUROCON | 2013年
关键词
voice conversion; VTLN; runtime performance; speech quality; MOS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several methods for voice conversion have been established. The research aims at the characteristics of a target speaker and a near-to-natural speech quality. This contribution summarizes the listening experiments with four conversion methods including the assessment of speech quality, listening effort and similarity to the target voice. The subjective evaluation of similarity is checked by an instrumental distance measure based on logarithmic spectral distortion. Practical applications of voice conversion require an appropriate runtime performance and memory use. We select a conversion method based on VTLN to demonstrate the runtime and quality trade-off. In the case example, we survey the quality assessment depending on different training constellations with a varied data amount and training time. Furthermore, we discuss the runtime performance of the selected conversion method under typical operating conditions. The experiments cover the influence of system resources, setting of conversion parameters (warping factors) and different training constellations. The observed real-time factors of a non-optimized laboratory VC version are inappropriate for typical application scenarios.
引用
收藏
页码:1684 / 1688
页数:5
相关论文
共 8 条
[1]  
[Anonymous], THESIS
[2]  
[Anonymous], P INTERSPEECH
[3]  
Eichner M, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P17
[4]  
HAGEN R, 1994, INT CONF ACOUST SPEE, P509
[5]  
Jokisch O., 2011, STUDIENTEXTE SPRACHK, V61, P349
[6]   Vocal tract normalization equals linear transformation in cepstral space [J].
Pitz, M ;
Ney, H .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :930-944
[7]  
Strecha G., 2005, P 9 UER C SPEECH COM, P2589
[8]  
Sundermann D., 2005, INTERSPEECH, P2593