Debiasing Vandalism Detection Models at Wikidata

被引：9

作者：

Heindorf, Stefan ^{[1
]}

Scholten, Yan ^{[1
]}

Engels, Gregor ^{[1
]}

Potthast, Martin ^{[2
]}

机构：

[1] Paderborn Univ, Paderborn, Germany

[2] Univ Leipzig, Leipzig, Germany

来源：

WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年

关键词：

DISCRIMINATION; BIAS;

D O I：

10.1145/3308558.3313507

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Crowdsourced knowledge bases like Wikidata suffer from low quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROCAUC and 0.316 PRAUC.

引用

页码：670 / 680

页数：11

共 67 条

[1]

[Anonymous], 2016, P WIMS 2016

[2]

[Anonymous], ECIR

[3]

[Anonymous], 2017, Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-level Constraints

[4]

[Anonymous], OPENSYM

[5]

[Anonymous], 2016, IEEE DATA ENG B

[6]

[Anonymous], WSDM CUP 2017 NOT

[7]

[Anonymous], 2017, EVID COMPLEMENT ALTE

[8]

[Anonymous], CICLING

[9]

[Anonymous], 2017, P INT AAAI C WEB SOC

[10] Bias on the Web [J].

Baeza-Yates, Ricardo .

COMMUNICATIONS OF THE ACM, 2018, 61 (06) :54-61

← 1 2 3 4 5 6 7 →