Towards human linguistic machine translation evaluation

被引:7
作者
Costa-jussa, Marta R. [1 ]
Farrus, Mireia [2 ]
机构
[1] Inst Infocomm Research, Singapore 138632, Singapore
[2] Pompeu Fabra Univ, Barcelona 08018, Spain
关键词
D O I
10.1093/llc/fqt065
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
When evaluating machine translation outputs, linguistics is usually taken into account implicitly. Annotators have to decide whether a sentence is better than another or not, using, for example, adequacy and fluency criteria or, as recently proposed, editing the translation output so that it has the same meaning as a reference translation, and it is understandable. Therefore, the important fields of linguistics of meaning (semantics) and grammar (syntax) are indirectly considered. In this study, we propose to go one step further towards a linguistic human evaluation. The idea is to introduce linguistics implicitly by formulating precise guidelines. These guidelines strictly mark the difference between the sub-fields of linguistics such as: morphology, syntax, semantics, and orthography. We show our guidelines have a high inter-annotation agreement and wide-error coverage. Additionally, we examine how the linguistic human evaluation data correlate with: among different types of machine translation systems (rule and statistical-based); and with adequacy and fluency.
引用
收藏
页码:157 / 166
页数:10
相关论文
共 18 条
[1]  
[Anonymous], 2011, PROC MT SUMMIT
[2]  
Callison-Burch Chris, 2010, P JOINT 5 WORKSHOP S
[3]  
Coehn J., 1968, EDUC PSYCHOL MEAS, V49, P835
[4]  
Costa-jussa M., 2012, COMPUTING I IN PRESS, V31, P1001
[5]  
Farrus M., 2009, 13 ANN M EAMT EUR AS, P52
[6]   Study and Correlation Analysis of Linguistic, Perceptual, and Automatic Machine Translation Evaluations [J].
Farrus, Mireia ;
Costa-jussa, Marta R. ;
Popovic, Maja .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (01) :174-184
[7]  
Flanagan Mary., 1994, Proceedings ofthe Association of Machine Translation ofthe Americas (AMTA-94), P65
[8]  
Kittur A, 2008, CHI 2008: 26TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS VOLS 1 AND 2, CONFERENCE PROCEEDINGS, P453
[9]   MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA [J].
LANDIS, JR ;
KOCH, GG .
BIOMETRICS, 1977, 33 (01) :159-174
[10]   N-gram-based machine translation [J].
Marino, Jose B. ;
Banchs, Rafael E. ;
Crego, Josep M. ;
de Gispert, Adria ;
Lambert, Patrik ;
Fonollosa, Jose A. R. ;
Costa-jussa, Marta R. .
COMPUTATIONAL LINGUISTICS, 2006, 32 (04) :527-549