A Wikipedia-based Corpus for Contextualized Machine Translation

被引:0
作者
Drexler, Jennifer [1 ]
Rastogi, Pushpendre [2 ]
Aguilar, Jacqueline [3 ]
Van Durme, Benjamin [3 ]
Post, Matt [3 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Machine Translation; Domain Adaptation; Corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We describe a corpus for and experiments in target-contextualized machine translation (MT), in which we incorporate language models from target-language documents that are comparable in nature to the source documents. This corpus comprises (i) a set of curated English Wikipedia articles describing news events along with (ii) their comparable Spanish counterparts, (iii) a number of the Spanish source articles cited within them, and (iv) English reference translations of all the Spanish data. In experiments, we evaluate the effect on translation quality when including language models built over these English documents and interpolated with other, separately-derived, more general language model sources. We find that even under this simplistic baseline approach, we achieve significant improvements as measured by BLEU score.
引用
收藏
页码:3593 / 3596
页数:4
相关论文
共 50 条
[41]   The creativity and limitations of AI neural machine translation A corpus-based study of DeepL's English-to-Chinese translation of Shakespeare's plays [J].
Hu, Kaibao ;
Li, Xiaoqian .
BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2023, 69 (04) :546-563
[42]   The Web as Corpus in Translation [J].
Song, Li-jue .
INTERNATIONAL CONFERENCE ON MODERN EDUCATION AND INFORMATION TECHNOLOGY (MEIT 2017), 2017, :238-242
[43]   IS MACHINE TRANSLATION RELIABLE IN THE LEGAL FIELD? A CORPUS-BASED CRITICAL COMPARATIVE ANALYSIS FOR TEACHING ESP AT TERTIARY LEVEL [J].
Giampieri, Patrizia .
ESP TODAY-JOURNAL OF ENGLISH FOR SPECIFIC PURPOSES AT TERTIARY LEVEL, 2023, 11 (01) :119-137
[44]   The implementation of Example-Based Machine Translation method in specialized machine translation systems [J].
Gajer, Miroslaw .
PRZEGLAD ELEKTROTECHNICZNY, 2011, 87 (02) :173-178
[45]   CQuAE: A new Contextualized QUestion Answering corpus on Education domain [J].
Gerald, Thomas ;
Tamames, Louis ;
Ettayeb, Sofiane ;
Le, Ha-Quang ;
Paroubek, Patrick ;
Vilnat, Anne .
DATA & KNOWLEDGE ENGINEERING, 2024, 151
[46]   WORD BASED MACHINE TRANSLATION SYSTEM [J].
谢金宝 ;
孙岗 ;
杨振宇 .
JournalofShanghaiJiaotongUniversity, 1999, (02) :104-108
[47]   Lexical Features and Translation Strategies of English Translation About Tourist Texts Based on Corpus [J].
Li, Xiangwu .
AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03) :2330-2333
[48]   Construction of parallel corpus for english translation teaching based on computer aided translation software [J].
Pan B. ;
Qin Q. .
Computer-Aided Design and Applications, 2022, 19 (s1) :70-80
[49]   Named entity translation method based on machine translation lexicon [J].
Li, Panpan ;
Wang, Mengxiang ;
Wang, Jian .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09) :3977-3985
[50]   Named entity translation method based on machine translation lexicon [J].
Panpan Li ;
Mengxiang Wang ;
Jian Wang .
Neural Computing and Applications, 2021, 33 :3977-3985