Calibration and context in human evaluation of machine translation

被引:1
作者
Knowles, Rebecca [1 ]
Lo, Chi-kiu [1 ]
机构
[1] Natl Res Council Canada, Ottawa, ON, Canada
来源
NATURAL LANGUAGE PROCESSING | 2025年 / 31卷 / 04期
关键词
machine translation; evaluation;
D O I
10.1017/nlp.2024.5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human evaluation of machine translation is considered the "gold standard" for evaluation, but it remains a challenging task for which to define best practices. Recent work has focused on incorporating intersentential context into human evaluation, to better distinguish between high-performing machine translation systems and human translations. In this work, we examine several ways that such context influences evaluation and evaluation protocols. We take a close look at annotator variation through the lens of calibration sets and focus on the implications for context-aware evaluation protocols. We then demonstrate one way in which degraded target-side intersentential context can influence annotator scores of individual sentences, a finding that supports the context-aware approach to evaluation and which also has implications for best practices in evaluation protocols.
引用
收藏
页码:1017 / 1041
页数:25
相关论文
共 46 条
[1]  
Akhbardeh Farhad, 2021, P WMT, P1
[2]  
Barrault L, 2019, FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), P1
[3]   Human platelet lysates for human cell propagation [J].
Barro, Lassina ;
Burnouf, Pierre-Alain ;
Chou, Ming-Li ;
Nebie, Ouada ;
Wu, Yu-Wen ;
Chen, Ming-Sheng ;
Radosevic, Miryana ;
Knutson, Folke ;
Burnouf, Thierry .
PLATELETS, 2021, 32 (02) :153-162
[4]  
Bojar, 2016, ACL 2016 1 C MACH TR, DOI [DOI 10.18653/V1/W16-2301, 10.18653/v1/W16-2301]
[5]  
Bojar Ond.rej, 2018, P 3 C MACH TRANSL SH, P272, DOI 10.18653/v1/W18-6401
[6]  
Bojar Ondrej., 2017, P 2 C MACH TRANSL CO, P169
[7]  
Callison-Burch C., 2008, FURTHER META EVALUAT, P70
[8]  
Callison-Burch C., 2007, META EVALUATION MACH, P136
[9]  
Castilho S., 2020, DOCUMENT LEVEL MACHI, P455
[10]  
Castilho S., 2020, SAME PAGE COMPARING, P1150