Debiasing Vandalism Detection Models at Wikidata

被引:9
作者
Heindorf, Stefan [1 ]
Scholten, Yan [1 ]
Engels, Gregor [1 ]
Potthast, Martin [2 ]
机构
[1] Paderborn Univ, Paderborn, Germany
[2] Univ Leipzig, Leipzig, Germany
来源
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年
关键词
DISCRIMINATION; BIAS;
D O I
10.1145/3308558.3313507
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Crowdsourced knowledge bases like Wikidata suffer from low quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROCAUC and 0.316 PRAUC.
引用
收藏
页码:670 / 680
页数:11
相关论文
共 67 条
[1]  
[Anonymous], 2016, P WIMS 2016
[2]  
[Anonymous], ECIR
[3]  
[Anonymous], 2017, Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-level Constraints
[4]  
[Anonymous], OPENSYM
[5]  
[Anonymous], 2016, IEEE DATA ENG B
[6]  
[Anonymous], WSDM CUP 2017 NOT
[7]  
[Anonymous], 2017, EVID COMPLEMENT ALTE
[8]  
[Anonymous], CICLING
[9]  
[Anonymous], 2017, P INT AAAI C WEB SOC
[10]   Bias on the Web [J].
Baeza-Yates, Ricardo .
COMMUNICATIONS OF THE ACM, 2018, 61 (06) :54-61