Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

被引:0
|
作者
Koehn, Philipp [1 ]
Guzman, Francisco [2 ]
Chaudhary, Vishrav [2 ]
Pino, Juan [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Facebook AI, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., 2018), we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
引用
收藏
页码:54 / 72
页数:19
相关论文
共 50 条
  • [1] Quality and Coverage: The AFRL Submission to the WMT19 Parallel Corpus Filtering For Low-Resource Conditions Task
    Erdmann, Grant
    Gwinnup, Jeremy
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 267 - 270
  • [2] Webinterpret Submission to the WMT2019 Shared Task on Parallel Corpus Filtering
    Gonzalez-Rubio, Jesus
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 271 - 276
  • [3] NRC Parallel Corpus Filtering System for WMT 2019
    Bernier-Colborne, Gabriel
    Lo, Chi-kiu
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 252 - 260
  • [4] Findings of the WMT 2019 Shared Task on Quality Estimation
    Fonseca, Erick
    Yankovskaya, Lisa
    Martins, Andre F. T.
    Fishel, Mark
    Federmann, Christian
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 1 - 10
  • [5] CUNI Submission for Low-Resource Languages in WMT News 2019
    Kocmi, Tom
    Bojar, Ondrej
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 234 - 240
  • [6] Findings of the WMT 2019 Shared Task on Automatic Post-Editing
    Chatterjee, Rajen
    Federmann, Christian
    Negri, Matteo
    Turchi, Marco
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 11 - 28
  • [7] The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
    Vazquez, Raul
    Sulubacak, Umut
    Tiedemann, Jorg
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 294 - 300
  • [8] Findings of the WMT 2023 Shared Task on Quality Estimation
    Blain, Frédéric
    Zerva, Chrysoula
    Rei, Ricardo
    Guerreiro, Nuno M.
    Kanojia, Diptesh
    de Souza, José G.C.
    Silva, Beatriz
    Vaz, Tânia
    Jingxuan, Yan
    Azadi, Fatemeh
    Orăsan, Constantin
    Martins, André F.T.
    Conference on Machine Translation - Proceedings, 2023, : 627 - 651
  • [9] PROMT Systems for WMT 2019 Shared Translation Task
    Molchanov, Alexander
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 302 - 307
  • [10] Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
    Chaudhary, Vishrav
    Tang, Yuqing
    Guzman, Francisco
    Schwenk, Holger
    Koehn, Philipp
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 261 - 266