A Wikipedia-based Corpus for Contextualized Machine Translation

被引:0
|
作者
Drexler, Jennifer [1 ]
Rastogi, Pushpendre [2 ]
Aguilar, Jacqueline [3 ]
Van Durme, Benjamin [3 ]
Post, Matt [3 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Machine Translation; Domain Adaptation; Corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We describe a corpus for and experiments in target-contextualized machine translation (MT), in which we incorporate language models from target-language documents that are comparable in nature to the source documents. This corpus comprises (i) a set of curated English Wikipedia articles describing news events along with (ii) their comparable Spanish counterparts, (iii) a number of the Spanish source articles cited within them, and (iv) English reference translations of all the Spanish data. In experiments, we evaluate the effect on translation quality when including language models built over these English documents and interpolated with other, separately-derived, more general language model sources. We find that even under this simplistic baseline approach, we achieve significant improvements as measured by BLEU score.
引用
收藏
页码:3593 / 3596
页数:4
相关论文
共 50 条
  • [1] Corpus Based Machine Translation for Scientific Text
    Tehseen, Irsha
    Tahir, Ghulam Rasool
    Shakeel, Khadija
    Ali, Mubbashir
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 196 - 206
  • [2] Corpus-based Disambiguation for Machine Translation
    Baisa, Vit
    RASLAN 2011: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING: FIFTH WORKSHOP, 2011, : 81 - 87
  • [3] Combining Wikipedia-Based Concept Models for Cross-Language Retrieval
    Roth, Benjamin
    Klakow, Dietrich
    ADVANCES IN MULTIDISCIPLINARY RETRIEVAL, 2010, 6107 : 47 - 59
  • [4] Corpus based Machine Translation System with Deep Neural Network for Sanskrit to Hindi Translation
    Singh, Muskaan
    Kumar, Ravinder
    Chana, Inderveer
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2534 - 2544
  • [5] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
    Bawden, Rachel
    Bilinski, Eric
    Lavergne, Thomas
    Rosset, Sophie
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (03) : 635 - 660
  • [6] English to Punjabi statistical machine translation using moses (Corpus Based)
    Jindal, Shishpal
    Goyal, Vishal
    Bhullar, Jaskarn Singh
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2018, 21 (04) : 553 - 560
  • [7] Corpus-based Machine Translation: Its Current Development and Perspectives
    Zhou Dajun
    Wang Yun
    PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON GLOBALIZATION: CHALLENGES FOR TRANSLATORS AND INTERPRETERS, 2014, : 90 - 96
  • [8] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
    Rachel Bawden
    Eric Bilinski
    Thomas Lavergne
    Sophie Rosset
    Language Resources and Evaluation, 2021, 55 : 635 - 660
  • [9] Wikipedia and Machine Translation: killing two birds with one stone
    Alegria, Inaki
    Cabezon, Unai
    Fernandez de Betono, Unai
    Labaka, Gorka
    Mayor, Aingeru
    Sarasola, Kepa
    Zubiaga, Arkaitz
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [10] A Reading Comprehension Corpus for Machine Translation Evaluation
    Scarton, Carolina
    Specia, Lucia
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3652 - 3658