Controlling Attribute Effect in Linear Regression

被引:69
作者
Calders, Toon [1 ]
Karim, Asim [2 ]
Kamiran, Faisal [3 ]
Ali, Wasif [2 ]
Zhang, Xiangliang [3 ]
机构
[1] Univ Libre Bruxelles, Comp & Decis Engn Dept, Brussels, Belgium
[2] LUMS, SBASSE, Dept Comp Sci, Lahore, Pakistan
[3] KAUST, CEMSE Div, Thuwal, Saudi Arabia
来源
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2013年
关键词
Linear Regression; Fair Data Mining; Batch Effects; Propensity Score; DISCRIMINATION; BIAS; RACE;
D O I
10.1109/ICDM.2013.114
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models.
引用
收藏
页码:71 / 80
页数:10
相关论文
共 23 条
  • [1] INFLATION BIAS IN SELF-ASSESSMENT EXAMINATIONS - IMPLICATIONS FOR VALID EMPLOYEE SELECTION
    ANDERSON, CD
    WARNER, JL
    SPENCER, CC
    [J]. JOURNAL OF APPLIED PSYCHOLOGY, 1984, 69 (04) : 574 - 580
  • [2] Bailar JohnC., 2012, Medical uses of statistics
  • [3] AN INTRODUCTION TO SAMPLE SELECTION BIAS IN SOCIOLOGICAL DATA
    BERK, RA
    [J]. AMERICAN SOCIOLOGICAL REVIEW, 1983, 48 (03) : 386 - 398
  • [4] Three naive Bayes approaches for discrimination-free classification
    Calders, Toon
    Verwer, Sicco
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (02) : 277 - 292
  • [5] Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods
    Chen, Chao
    Grennan, Kay
    Badner, Judith
    Zhang, Dandan
    Gershon, Elliot
    Jin, Li
    Liu, Chunyu
    [J]. PLOS ONE, 2011, 6 (02):
  • [6] COCHRAN WG, 1973, SANKHYA SER A, V35, P417
  • [7] Kamiran Faisal, 2010, Proceedings 2010 10th IEEE International Conference on Data Mining (ICDM 2010), P869, DOI 10.1109/ICDM.2010.50
  • [8] Classifying Socially Sensitive Data Without Discrimination: An Analysis of a Crime Suspect Dataset
    Kamiran, Faisal
    Karim, Asim
    Verwer, Sicco
    Goudriaan, Heike
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 370 - 377
  • [9] Decision Theory for Discrimination-aware Classification
    Kamiran, Faisal
    Karim, Asim
    Zhang, Xiangliang
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 924 - 929
  • [10] Data preprocessing techniques for classification without discrimination
    Kamiran, Faisal
    Calders, Toon
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (01) : 1 - 33