Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

被引:0
|
作者
Koehn, Philipp [1 ]
Guzman, Francisco [2 ]
Chaudhary, Vishrav [2 ]
Pino, Juan [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Facebook AI, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., 2018), we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
引用
收藏
页码:54 / 72
页数:19
相关论文
共 50 条
  • [21] Apertium-fin-eng-Rule-based shallow machine translation for WMT 2019 shared task
    Pirinen, Tommi A.
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 335 - 341
  • [22] JU-Saarland Submission in the WMT2019 English-Gujarati Translation Shared Task
    Mondal, Riktim
    Nayek, Shankha Raj
    Chowdhury, Aditya
    Pal, Santanu
    Naskar, Sudip Kumar
    van Genabith, Josef
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 308 - 313
  • [23] AAVE Corpus Generation and Low-Resource Dialect Machine Translation
    Graves, Eric
    Aswar, Shreyas
    Desai, Rujuta
    Nampelli, Srilekha
    Chakraborty, Sunandan
    Hall, Ted
    PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, : 50 - 59
  • [24] UdS-DFKI Participation at WMT 2019: Low-Resource (en-gu) and Coreference-Aware (en-de) Systems
    Espana-Bonet, Cristina
    Ruiter, Dana
    van Genabith, Josef
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 183 - 190
  • [25] TOOLS FOR CREATING A CORPUS OF DICTIONARIES AND AN INSTRUMENT FOR DOCUMENTING LOW-RESOURCE LANGUAGES
    Otsomieva-Tagirova, Zabihat
    Temirbulatova, Sapiahanum
    Magomedov, Magomed
    Kieva, Zufira
    Dudarova, Ludmila
    REVISTA ENTRELINGUAS, 2021, 7
  • [26] Resource Recommendation Based on Industrial Knowledge Graph in Low-Resource Conditions
    Liu, Yangshengyan
    Gu, Fu
    Gu, Xinjian
    Wu, Yijie
    Guo, Jianfeng
    Zhang, Jin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 15 (01)
  • [27] Resource Recommendation Based on Industrial Knowledge Graph in Low-Resource Conditions
    Yangshengyan Liu
    Fu Gu
    Xinjian Gu
    Yijie Wu
    Jianfeng Guo
    Jin Zhang
    International Journal of Computational Intelligence Systems, 15
  • [28] APE through neural and statistical MT with augmented data: ADAPT/DCU submission to the WMT 2019 APE Shared task
    Shterionov, Dimitar
    Wagner, Joachim
    do Carmo, Felix
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 132 - 138
  • [29] Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation
    Richter, Caitlin
    Wickes, Matthew
    Beser, Deniz
    Marcus, Mitch
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2331 - 2339
  • [30] Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures
    Dunn, Jonathan
    Li, Haipeng
    Sastre, Damian
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6461 - 6470