Data preprocessing techniques for classification without discrimination

被引：644

作者：

Kamiran, Faisal

Calders, Toon

机构：

[1] 5600 MB Eindhoven, HG 7.46

[2] 5600 MB Eindhoven, HG 7.82a

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2012年 / 33卷 / 01期

关键词：

Classification; Preprocessing; Discrimination-aware data mining;

D O I：

10.1007/s10115-011-0463-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier that optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem is relevant in many settings, such as when the data are generated by a biased decision process or when the sensitive attribute serves as a proxy for unobserved features. In this paper, we concentrate on the case with only one binary sensitive attribute and a two-class classification problem. We first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. Then, we look at algorithmic solutions that preprocess the data to remove discrimination before a classifier is learned. We survey and extend our existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These preprocessing techniques have been implemented in a modified version of Weka and we present the results of experiments on real-life data.

引用

页码：1 / 33

页数：33

共 25 条

[1] [Anonymous], 2010, PROC ACM SIGMOD INT
[2] [Anonymous], 2010, KDD
[3] [Anonymous], 2007, Uci machine learning repository
[4] Calders T, 2009, IEEE ICDM WORKSHOP O
[5] Three naive Bayes approaches for discrimination-free classification
Calders, Toon
Verwer, Sicco
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (02) : 277 - 292
[6] Chan P. K., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P164
[7] Chao EL, 2007, WOMEN IN THE LABOR F
[8] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[9] Chawla NV, 2005, WRAPPER BASED COMPUT
[10] Domingos P., 1999, P ACM SIGKDD INT C K, P155, DOI DOI 10.1145/312129.312220

← 1 2 3 →