Word-Level Quality Estimation for Korean-English Neural Machine Translation

被引:3
作者
Eo, Sugyeong [1 ]
Park, Chanjun [1 ,2 ]
Moon, Hyeonseok [1 ]
Seo, Jaehyung [1 ]
Lim, Heuiseok [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 02841, South Korea
[2] Upstage, Yongin 16942, Gyeonggi Do, South Korea
关键词
Predictive models; Data models; Feature extraction; Task analysis; Annotations; Costs; Machine translation; Quality estimation; neural machine translation; multilingual pretrained language model; natural language processing;
D O I
10.1109/ACCESS.2022.3169155
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Quality estimation (QE) task aims to predict the machine translation (MT) quality well by referring to the source sentence and its MT output. The various applicability of QE proves the importance of QE research, but the enormous human labor to construct the QE dataset remains a challenge. This study proposes three automatic word-level pseudo-QE data construction strategies using a monolingual or parallel corpus and an external machine translator without human labor. We utilize these individual pseudo-QE datasets to finetune multilingual pretrained language models such as cross-lingual language models (XLM), XLM-RoBERTa, and multilingual BART and comparatively analyze the results. Considering the synthetic dataset creation setup, we attempt to validate the objectivity of the QE model by leveraging four test sets translated by external translators from Google, Amazon, Microsoft, and Systran. As a result, XLM-R-large shows the best performance among mPLMs. We also verify the reliability of the QE model through the close performance gaps between different test sets. To the best of our knowledge, this is the first study to experiment with word-level Korean-English QE.
引用
收藏
页码:44964 / 44973
页数:10
相关论文
共 38 条
  • [1] [Anonymous], 2012, P 7 WORKSH STAT MACH
  • [2] [Anonymous], 2016, WMT
  • [3] [Anonymous], 2017, P 2 C MACH TRANSL
  • [4] [Anonymous], 2012, P 7 WORKSH STAT MACH
  • [5] Bicici E., 2014, REFERENTIAL TRANSLAT
  • [6] Cho K., 2014, P C EMP METH NAT LAN, P1724
  • [7] Conneau Alexis, 2019, CoRR abs/1911.02116
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Dyer Chris, 2013, P 2013 C N AM CHAPT, P644
  • [10] Comparative Analysis of Current Approaches to Quality Estimation for Neural Machine Translation
    Eo, Sugyeong
    Park, Chanjun
    Moon, Hyeonseok
    Seo, Jaehyung
    Lim, Heuiseok
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (14):