Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

被引:0
作者
Wan, Ruyuan [1 ]
Kim, Jaehyung [2 ]
Kang, Dongyeop [3 ]
机构
[1] Univ Notre Dame, Notre Dame, IN 46556 USA
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[3] Univ Minnesota, Minneapolis, MN USA
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on the agreement of major annotators. However, annotators are individuals with different backgrounds, and minors' opinions should not be simply ignored. As annotation tasks become subjective and topics are controversial in modern NLP tasks, we need NLP systems that can represent people's diverse voices on subjective matters and predict the level of diversity. This paper examines whether the text of the task and annotators' demographic background information can be used to estimate the level of disagreement among annotators. Particularly, we extract disagreement labels from the annotators' voting histories in the five subjective datasets, and then fine-tune language models to predict annotators' disagreement. Our results show that knowing annotators' demographic information, like gender, ethnicity, and education level, helps predict disagreements. In order to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators' different perspectives, we simulate everyone's voices with different combinations of annotators' artificial demographics and examine its variance of the fine-tuned disagreement predictor. Our paper aims to improve the annotation process for more efficient and inclusive NLP systems through a novel disagreement prediction mechanism. Our code and dataset are publicly available.
引用
收藏
页码:14523 / 14530
页数:8
相关论文
共 22 条
[1]  
Alm C. O., 2011, ACL
[2]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[3]  
CRENSHAW K, 1993, STANFORD LAW REVIEW VOL 43, NO 6, JULY 1991, P1241
[4]  
Danescu-Niculescu-Mizil C., 2013, ACL
[5]   Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations [J].
Davani, Aida Mostafazadeh ;
Diaz, Mark ;
Prabhakaran, Vinodkumar .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 :92-110
[6]  
FLEISS JL, 1971, PSYCHOL BULL, V76, P378, DOI 10.1037/h0031619
[7]  
Foley J, 2018, Arxiv, DOI arXiv:1806.05004
[8]  
Forbes M, 2021, Arxiv, DOI arXiv:2011.00620
[9]   Jury Learning: Integrating Dissenting Voices into Machine Learning Models [J].
Gordon, Mitchell L. ;
Lam, Michelle S. ;
Park, Joon Sung ;
Patel, Kayur ;
Hancock, Jeff T. ;
Hashimoto, Tatsunori ;
Bernstein, Michael S. .
PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
[10]  
Hendrycks D, 2021, Arxiv, DOI arXiv:2008.02275