On the relation of causality- versus correlation-based feature selection on model fairness

被引:5
作者
Saarela, Mirka [1 ]
机构
[1] Univ Jyvaskyla, Jyvaskyla, Finland
来源
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年
基金
芬兰科学院;
关键词
Feature Selection; Causality; Markov Blanket; IPCMB; Machine Learning Fairness; MARKOV BLANKET INDUCTION; LOCAL CAUSAL; DISCOVERY;
D O I
10.1145/3605098.3636018
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As machine learning models are used increasingly in the educational domain, ensuring that they are fair and do not discriminate against certain groups or individuals is imperative. Although there are a few recent attempts to ensure fairness in these models, the majority of fairness literature tends to overlook the feature selection (FS) process despite its critical role as one of the foundational steps in the machine learning pipeline. Moreover, traditional FS methods identify features by examining the correlational relationships between predictive features and the target variable without seeking to uncover causal connections between them. To address these issues, we compare for four openly available datasets-two educational ones and two benchmark datasets regularly used in the fairness literature-the impact of these two different ways of FS (i.e., causality- versus correlation-based) on the performance and fairness of the resulting models. Our results show that causality-based FS generally leads to fairer models, while the models built after correlation-based FS manifest higher performance.
引用
收藏
页码:56 / 64
页数:9
相关论文
共 50 条
[11]  
Dheeru Dua and Casey Graff, 2017, UCI machine learning repository
[12]   Algorithmic fairness datasets: the story so far [J].
Fabris, Alessandro ;
Messina, Stefano ;
Silvello, Gianmaria ;
Susto, Gian Antonio .
DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (06) :2074-2152
[13]   Stochastic gradient boosting [J].
Friedman, JH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :367-378
[14]  
Fu S, 2008, LECT NOTES ARTIF INT, V5032, P96
[15]  
Fu SK, 2008, LECT NOTES ARTIF INT, V5012, P562, DOI 10.1007/978-3-540-68125-0_51
[16]  
Fu SK, 2010, LECT NOTES ENG COMP, P321
[17]   Causal Feature Selection for Algorithmic Fairness [J].
Galhotra, Sainyam ;
Shanmugam, Karthikeyan ;
Sattigeri, Prasanna ;
Varshney, Kush R. .
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, :276-285
[18]  
Guyon I., 2003, Journal of Machine Learning Research, V3, P1157, DOI 10.1162/153244303322753616
[19]  
Guyon I, 2008, CH CRC DATA MIN KNOW, P63
[20]  
Hastie T, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, V2