Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

被引:0
|
作者
Koehn, Philipp [1 ]
Guzman, Francisco [2 ]
Chaudhary, Vishrav [2 ]
Pino, Juan [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Facebook AI, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., 2018), we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
引用
收藏
页码:54 / 72
页数:19
相关论文
共 50 条
  • [11] Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies
    Bawden, Rachel
    Cohen, K. Bretonnel
    Grozea, Cristian
    Yepes, Antonio Jimeno
    Kittner, Madeleine
    Krallinger, Martin
    Mah, Nancy
    Neveol, Aurelie
    Neves, Mariana
    Soares, Felipe
    Siu, Amy
    Verspoor, Karin
    Navarro, Maika Vicente
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 29 - 53
  • [12] Kyoto University participation to the WMT 2019 news shared task
    Cromieres, Fabien
    Kurohashi, Sadao
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 163 - 167
  • [13] Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation
    Adjeisah, Michael
    Liu, Guohua
    Nyabuga, Douglas Omwenga
    Nortey, Richard Nuetey
    Song, Jinling
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [14] Punctuation and Parallel Corpus Based Word Embedding Model for Low-Resource Languages
    Yuan, Yang
    Li, Xiao
    Yang, Ya-Ting
    INFORMATION, 2020, 11 (01)
  • [15] A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages
    Khairova, Nina
    Mamyrbayev, Orken
    Rizun, Nina
    Razno, Mariia
    Galiya, Ybytayeva
    IEEE ACCESS, 2023, 11 : 54093 - 54111
  • [16] JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages
    Agic, Eljko
    Vulic, Ivan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3204 - 3210
  • [17] Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation
    Imankulova, Aizhan
    Sato, Takayuki
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [18] Incidental Findings in Low-Resource Settings
    Sullivan, Haley K.
    Berkman, Benjamin E.
    HASTINGS CENTER REPORT, 2018, 48 (03) : 20 - 28
  • [19] UDS-DFKI Submission to the WMT2019 Similar Language Translation Shared Task
    Pal, Santanu
    Zampieri, Marcos
    van Genabith, Josef
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 219 - 223
  • [20] Supervision of Task-Shared Mental Health Care in Low-Resource Settings: A Commentary on Programmatic Experience
    Kemp, Christopher G.
    Petersen, Inge
    Bhana, Arvin
    Rao, Deepa
    GLOBAL HEALTH-SCIENCE AND PRACTICE, 2019, 7 (02): : 150 - 159