INTENT TRANSFER IN SPEECH-TO-SPEECH MACHINE TRANSLATION

被引:0
作者
Anumanchipalli, Gopala Krishna [1 ]
Oliveira, Luis C.
Black, Alan W. [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012) | 2012年
关键词
Speech Translation; Prominence; Focus; Speech Synthesis; Cross-lingual Transfer;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents an approach for transfer of speaker intent in speech-to-speech machine translation (S2SMT). Specifically, we describe techniques to retain the prominence patterns of the source language utterance through the translation pipeline and impose this information during speech synthesis in the target language. We first present an analysis of word focus across languages to motivate the problem of transfer. We then propose an approach for training an appropriate transfer function for intonation on a parallel speech corpus in the two languages within which the translation is carried out. We present our analysis and experiments on English <-> Portuguese and English <-> German language pairs and evaluate the proposed transformation techniques through objective measures.
引用
收藏
页码:153 / 158
页数:6
相关论文
共 20 条
  • [1] Aguero P., 2006, 2006 IEEE INT C ACOU, pI
  • [2] Al-Onaizan Y, 2007, INT CONF ACOUST SPEE, P1285
  • [3] [Anonymous], INT 2006 PITTSB PA
  • [4] Anumanchipalli Gopala Krishna, 2011, INT 2011 FLOR IT
  • [5] Anumanchipalli GopalaKrishna., 2010, Spoken Languages Technologies for Under-Resourced Languages
  • [6] Efficient Speech Translation Through Confusion Network Decoding
    Bertoldi, Nicola
    Zens, Richard
    Federico, Marcello
    Shen, Wade
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1696 - 1705
  • [7] Koehn P., 2005, P MACH TRANSL SUMM 1, P79
  • [8] Kurimo Mikko, 2010, ACL 2010
  • [9] MO Y., 2008, SPEECH PROSODY
  • [10] VERBMOBIL:: The use of prosody in the linguistic components of a speech understanding system
    Nöth, E
    Batliner, A
    Kiessling, A
    Kompe, R
    Niemann, H
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05): : 519 - 532