Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

被引:2
作者
Dost, Katharina [1 ]
Duncanson, Hamish [1 ]
Ziogas, Ioannis [2 ]
Riddle, Patricia [1 ]
Wicker, Jorg [1 ]
机构
[1] Univ Auckland, Auckland, New Zealand
[2] Univ Mississippi, Oxford, MS USA
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II | 2022年 / 13281卷
关键词
D O I
10.1007/978-3-031-05936-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 22 条
[1]  
Abreu N, 2011, ANALISE PERFIL CLIEN
[2]  
Bareinboim E, 2014, AAAI CONF ARTIF INTE, P2410
[3]   AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias [J].
Bellamy, R. K. E. ;
Dey, K. ;
Hind, M. ;
Hoffman, S. C. ;
Houde, S. ;
Kannan, K. ;
Lohia, P. ;
Martino, J. ;
Mehta, S. ;
Mojsilovie, A. ;
Nagar, S. ;
Ramamurthy, K. Natesan ;
Richards, J. ;
Saha, D. ;
Sattigeri, P. ;
Singh, M. ;
Varshney, K. R. ;
Zhang, Y. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]   Your Best Guess When You Know Nothing: Identification and Mitigation of Selection Bias [J].
Dost, Katharina ;
Taskova, Katerina ;
Riddle, Patricia ;
Wicker, Jorg .
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, :996-1001
[6]  
Dua D, 2017, UCI MACHINE LEARNING
[7]  
Goel N, 2018, AAAI CONF ARTIF INTE, P3029
[8]  
Granichin O., 2015, Intelligent Systems Reference Library, V67, P163
[9]  
Hassani B. K., 2021, AI and Ethics, V1, P239, DOI [10.1007/s43681-020-00026-z, DOI 10.1007/S43681-020-00026-Z]
[10]   THE CENTRAL LIMIT THEOREM FOR DEPENDENT RANDOM VARIABLES [J].
HOEFFDING, W ;
ROBBINS, H .
DUKE MATHEMATICAL JOURNAL, 1948, 15 (03) :773-780