A Wikipedia-based Corpus for Contextualized Machine Translation

被引：0

作者：

Drexler, Jennifer ^{[1
]}

Rastogi, Pushpendre ^{[2
]}

Aguilar, Jacqueline ^{[3
]}

Van Durme, Benjamin ^{[3
]}

Post, Matt ^{[3
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Machine Translation; Domain Adaptation; Corpus;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

We describe a corpus for and experiments in target-contextualized machine translation (MT), in which we incorporate language models from target-language documents that are comparable in nature to the source documents. This corpus comprises (i) a set of curated English Wikipedia articles describing news events along with (ii) their comparable Spanish counterparts, (iii) a number of the Spanish source articles cited within them, and (iv) English reference translations of all the Spanish data. In experiments, we evaluate the effect on translation quality when including language models built over these English documents and interpolated with other, separately-derived, more general language model sources. We find that even under this simplistic baseline approach, we achieve significant improvements as measured by BLEU score.

引用

页码：3593 / 3596

页数：4

共 50 条

[11] The parallel corpus for information extraction based on natural language processing and machine translation
He, Honghua
EXPERT SYSTEMS, 2019, 36 (05)
[12] Crowdsourcing a Wikipedia Vandalism Corpus
Potthast, Martin
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 789 - 790
[13] Construction of Mizo: English Parallel Corpus for Machine Translation
Haulai, Thangkhanhau
Hussain, Jamal
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
[14] Machine Translation on a Parallel Code-Switched Corpus
Menacer, M. A.
Langlois, D.
Jouvet, D.
Fohr, D.
Mella, O.
Smaili, K.
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 426 - 432
[15] The Application of Paraphrasing Technology of Machine Translation in the Construction of Corpus
Jing, Wang
PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT SCIENCE AND ECONOMICS (ICEMSE 2017), 2017, 49 : 300 - 303
[16] Machine Translation and Linguistic Use: An Analysis of English-French Translations Reunited in Corpus
Loock, Rudy
META, 2018, 63 (03) : 786 - 806
[17] An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
Pighin, Daniele
Marquez, Lluis
May, Jonathan
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1131 - 1136
[18] A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
Avramidis, Eleftherios
Costa-Jussa, Marta R.
Federmann, Christian
Melero, Maite
Pecina, Pavel
van Genabith, Josef
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2189 - 2193
[19] Building a Tunisian Dialect into Arabic Language Parallel Corpus for a Phrase-based Machine Translation
Sghaier, Mohamed Ali
Zrigui, Mounir
VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 2910 - 2921
[20] Big-Data Based English-Chinese Corpus Collection and Mining and Machine Translation Framework
Guo, Hang
Jiang, Liu
PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 418 - 421

← 1 2 3 4 5 →