On the relation of causality- versus correlation-based feature selection on model fairness

被引：5

作者：

Saarela, Mirka ^{[1
]}

机构：

[1] Univ Jyvaskyla, Jyvaskyla, Finland

来源：

39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年

基金：

芬兰科学院;

关键词：

Feature Selection; Causality; Markov Blanket; IPCMB; Machine Learning Fairness; MARKOV BLANKET INDUCTION; LOCAL CAUSAL; DISCOVERY;

D O I：

10.1145/3605098.3636018

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

As machine learning models are used increasingly in the educational domain, ensuring that they are fair and do not discriminate against certain groups or individuals is imperative. Although there are a few recent attempts to ensure fairness in these models, the majority of fairness literature tends to overlook the feature selection (FS) process despite its critical role as one of the foundational steps in the machine learning pipeline. Moreover, traditional FS methods identify features by examining the correlational relationships between predictive features and the target variable without seeking to uncover causal connections between them. To address these issues, we compare for four openly available datasets-two educational ones and two benchmark datasets regularly used in the fairness literature-the impact of these two different ways of FS (i.e., causality- versus correlation-based) on the performance and fairness of the resulting models. Our results show that causality-based FS generally leads to fairer models, while the models built after correlation-based FS manifest higher performance.

引用

页码：56 / 64

页数：9

共 50 条

[1] Artificial intelligence in education: Addressing ethical challenges in K-12 settings [J].