The KIT Lecture Corpus for Speech Translation

被引:0
作者
Stueker, Sebastian [1 ]
Kraft, Florian [1 ]
Mohr, Christian [1 ]
Herrmann, Teresa [1 ]
Cho, Eunah [1 ]
Waibel, Alex [1 ]
机构
[1] Karlsruhe Inst Technol, Interact Syst Labs, D-76021 Karlsruhe, Germany
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
speech translation; talk translation; corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art spoken language translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also allow us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of the different lectures.
引用
收藏
页码:3409 / 3414
页数:6
相关论文
共 12 条
[1]  
Atkins D.E, 2007, TECHNICAL REPORT
[2]  
Boudahmane Karim, 2011, INTERNATIONAL WORKSH
[3]  
Burger Susanne, 1997, TRANSLITERATION SPON
[4]  
Federico Marcello, 2011, INTERNATIONAL WORKSH
[5]   Simultaneous translation of lectures and speeches [J].
Fugen, Christian ;
Waibel, Alex ;
Kolss, Muntsin .
MACHINE TRANSLATION, 2007, 21 (04) :209-252
[6]  
Fugen Christian, 2008, THESIS
[7]  
Hamon O., 2007, P MT SUMM COP, P223
[8]  
Lamel Lori, 2011, INTERNATIONAL WORKSH
[9]  
Papineni Kishore, 2002, TECHNICAL REPORT RC2
[10]  
Paul Michael, 2010, PROC OF THE INTERNAT