Impacts of machine translation and speech synthesis on speech-to-speech translation

被引:7
作者
Hashimoto, Kei [1 ]
Yamagishi, Junichi [2 ]
Byrne, William [3 ]
King, Simon [2 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi, Japan
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[3] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
基金
英国工程与自然科学研究理事会; 日本学术振兴会;
关键词
Speech-to-speech translation; Machine translation; Speech synthesis; Subjective evaluation;
D O I
10.1016/j.specom.2012.02.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques have been proposed for integration of speech recognition and machine translation. However, corresponding techniques have not yet been considered for speech synthesis. The focus of the current work is machine translation and speech synthesis, and we present a subjective evaluation designed to analyze their impact on speech-to-speech translation. The results of these analyses show that the naturalness and intelligibility of the synthesized speech are strongly affected by the fluency of the translated sentences. In addition, several features were found to correlate well with the average fluency of the translated sentences and the average naturalness of the synthesized speech. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:857 / 866
页数:10
相关论文
共 32 条
[1]  
[Anonymous], P ICASSP
[2]  
[Anonymous], 2009, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
[3]  
[Anonymous], 1999, P EUROSPEECH
[4]  
[Anonymous], 2010, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
[5]  
Boidin C., 2009, ISCA, P2487
[6]   Efficient integrated response generation from multiple targets using weighted finite state transducers [J].
Bulyko, I ;
Ostendorf, M .
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (3-4) :533-550
[7]   Recent efforts in spoken language translation - A look at the statistical approach [J].
Casacuberta, Francisco ;
Federico, Marcello ;
Ney, Hermann ;
Vidal, Enrique .
IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (03) :80-88
[8]  
Chae J., 2009, Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics EACL, P139
[9]  
DEGISPERT A, 2009, P HUM LANG TECHN 200, P73
[10]   Amazon Mechanical Turk: Gold Mine or Coal Mine? [J].
Fort, Karen ;
Adda, Gilles ;
Cohen, K. Bretonnel .
COMPUTATIONAL LINGUISTICS, 2011, 37 (02) :413-420