A Wikipedia-based Corpus for Contextualized Machine Translation

被引：0

作者：

Drexler, Jennifer ^{[1
]}

Rastogi, Pushpendre ^{[2
]}

Aguilar, Jacqueline ^{[3
]}

Van Durme, Benjamin ^{[3
]}

Post, Matt ^{[3
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Machine Translation; Domain Adaptation; Corpus;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

We describe a corpus for and experiments in target-contextualized machine translation (MT), in which we incorporate language models from target-language documents that are comparable in nature to the source documents. This corpus comprises (i) a set of curated English Wikipedia articles describing news events along with (ii) their comparable Spanish counterparts, (iii) a number of the Spanish source articles cited within them, and (iv) English reference translations of all the Spanish data. In experiments, we evaluate the effect on translation quality when including language models built over these English documents and interpolated with other, separately-derived, more general language model sources. We find that even under this simplistic baseline approach, we achieve significant improvements as measured by BLEU score.

引用

页码：3593 / 3596

页数：4

共 50 条

[1] Corpus Based Machine Translation for Scientific Text
Tehseen, Irsha
Tahir, Ghulam Rasool
Shakeel, Khadija
Ali, Mubbashir
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 196 - 206
[2] Corpus-based Disambiguation for Machine Translation
Baisa, Vit
RASLAN 2011: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING: FIFTH WORKSHOP, 2011, : 81 - 87
[3] Combining Wikipedia-Based Concept Models for Cross-Language Retrieval
Roth, Benjamin
Klakow, Dietrich
ADVANCES IN MULTIDISCIPLINARY RETRIEVAL, 2010, 6107 : 47 - 59
[4] Corpus based Machine Translation System with Deep Neural Network for Sanskrit to Hindi Translation
Singh, Muskaan
Kumar, Ravinder
Chana, Inderveer
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2534 - 2544
[5] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
Bawden, Rachel
Bilinski, Eric
Lavergne, Thomas
Rosset, Sophie
LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (03) : 635 - 660
[6] English to Punjabi statistical machine translation using moses (Corpus Based)
Jindal, Shishpal
Goyal, Vishal
Bhullar, Jaskarn Singh
JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2018, 21 (04) : 553 - 560
[7] Corpus-based Machine Translation: Its Current Development and Perspectives
Zhou Dajun
Wang Yun
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON GLOBALIZATION: CHALLENGES FOR TRANSLATORS AND INTERPRETERS, 2014, : 90 - 96
[8] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
Rachel Bawden
Eric Bilinski
Thomas Lavergne
Sophie Rosset
Language Resources and Evaluation, 2021, 55 : 635 - 660
[9] Wikipedia and Machine Translation: killing two birds with one stone
Alegria, Inaki
Cabezon, Unai
Fernandez de Betono, Unai
Labaka, Gorka
Mayor, Aingeru
Sarasola, Kepa
Zubiaga, Arkaitz
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
[10] A Reading Comprehension Corpus for Machine Translation Evaluation
Scarton, Carolina
Specia, Lucia
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3652 - 3658

← 1 2 3 4 5 →