Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation

被引:133
作者
Aroyo, Lora [1 ]
Welty, Chris [2 ,3 ]
机构
[1] Vrije Univ Amsterdam, Dept Comp Sci, Web & Media Grp, Amsterdam, Netherlands
[2] Google Res, New York, NY USA
[3] Rensselaer Polytech Inst, Comp Sci, Troy, NY 12181 USA
关键词
Artificial intelligence - Semantics;
D O I
10.1609/aimag.v36i1.2564
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks is a critical part of big data semantics, but it is based on an antiquated ideal of a single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most of which derive from that antiquated ideal of truth, and dispel these myths with examples from our research. We propose a new theory of truth, crowd truth, that is based on the intuition that human interpretation is subjective, and that measuring annotations on the same objects of interpretation (in our examples, sentences) across a crowd will provide a useful representation of their subjectivity and the range of reasonable interpretations.
引用
收藏
页码:15 / 24
页数:10
相关论文
共 23 条
[1]  
Alonso O, 2011, LECT NOTES COMPUT SC, V6611, P153, DOI 10.1007/978-3-642-20161-5_16
[2]  
[Anonymous], 2008, P 14 ACM SIGKDD INT, DOI [10.1145/1401890.1401965, 10.1145]
[3]  
[Anonymous], 2010, Proceedings of the ACM SIGKDD workshop on human computation, DOI 10.1145/1837885.1837906
[4]  
[Anonymous], 2013, P 22 INT C WORLD WID
[5]  
Aroyo L., 2013, SEMANTIC BIG DATA PA
[6]  
Aroyo Lora, 2013, WEB SCI 2013
[7]  
Bozzon Alessandro, 2013, P 22 INT C WORLD WID, P153
[8]  
Carlson A., 2009, Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, P1
[9]  
Chklovski T., 2003, INT C REC ADV NAT LA
[10]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46