Unsupervised features from text for speech synthesis in a speech-to-speech translation system

被引:0
作者
Watts, Oliver [1 ]
Zhou, Bowen [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore the use of linguistic features for text to speech (ITS) conversion in the context of a speech-to-speech translation system that can be extracted from unannotated text in an unsupervised, language-independent fashion. The features are intended to act as surrogates for conventional part of speech (POS) features. Unlike POS features, the experimental features assume only the availability of tools and data that must already be in place for the construction of other components of the translation system, and can therefore be used for the TTS module without incurring additional TTS-specific costs. We here describe the use of the experimental features in a speech synthesiser, using six different configurations of the system to allow the comparison of the proposed features with conventional, knowledge-based POS features. We present results of objective and subjective evaluations of the usefulness of the new features.
引用
收藏
页码:2164 / 2167
页数:4
相关论文
共 8 条
[1]  
[Anonymous], 1993, P EUROSPEECH
[2]  
[Anonymous], 2010, P INT SPEECH COMM AS
[3]  
Och F. J., 1995, MAXIMUM LIKELIHOOD S
[4]  
Pan S., 1999, P JOINT SIGDAT C EMN
[5]  
Pan S., 2000, P 38 ANN M ASS COMP
[6]  
Schweitzer K, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P138
[7]  
Zen H., 2007, P 6 ISCA WORKSH SPEE, P294
[8]  
Zen HG, 2007, IEICE T INF SYST, VE90D, P325, DOI [10.1093/ietisy/e90-1.1.325, 10.1093/ietisy/e90-d.1.325]