Feature Selection Under Fairness and Performance Constraints

被引:2
作者
Dorleon, Ginel [1 ]
Megdiche, Imen [1 ]
Bricon-Souf, Nathalie [1 ]
Teste, Olivier [1 ]
机构
[1] Toulouse Inst Comp Sci Res IRIT, Toulouse, France
来源
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022 | 2022年 / 13428卷
关键词
Feature selection; Fairness; Protected features; Bias; Machine learning; RELEVANCE;
D O I
10.1007/978-3-031-12670-3_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an essential preprocessing procedure in data analysis. The process refers to selecting a subset of relevant features to improve prediction performance and better understand the data. However, we notice that traditional feature selection methods have limited ability to deal with data distribution over protected features due to data imbalance and indeed protected features are selected. Two problems can occur with current feature selection methods when protected features are considered: the presence of protected features among the selected ones which often lead to unfair results and the presence of redundant features which carry potentially the same information with the protected ones. To address these issues, we introduce in this paper a fair feature selection method that takes into account the existence of protected features and their redundant. Our new method finds a set of relevant features with no protected features and with the least possible redundancy under prediction quality constraint. This constraint consists of a tradeoff between fairness and prediction performance. Our experiments on well-known biased datasets from the literature demonstrated that our proposed method outperformed the traditional feature selection methods under comparison in terms of performance and fairness.
引用
收藏
页码:125 / 130
页数:6
相关论文
共 9 条
[1]  
Dwork C., 2012, P 3 INNOVATIONS THEO, P214, DOI DOI 10.1145/2090236.2090255
[2]  
Fang BL, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P444
[3]   Random forest classifier for remote sensing classification [J].
Pal, M .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2005, 26 (01) :217-222
[4]   Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy [J].
Peng, HC ;
Long, FH ;
Ding, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) :1226-1238
[5]  
Schapire RobertE., 2013, EMPIRICAL INFERENCE, P37, DOI DOI 10.1007/978-3-642-41136-6_5
[6]   Intelligent Fault Diagnosis of Delta 3D Printers Using Local Support Vector Machine by a Cheap Attitude Multi-sensor [J].
Wang, Man ;
Sun, Zhenzhong .
2020 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-BESANCON 2020), 2020, :21-27
[7]   Fair Class Balancing: Enhancing Model Fairness without Observing Sensitive Attributes [J].
Yan, Shen ;
Kao, Hsien-te ;
Ferrara, Emilio .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :1715-1724
[8]  
Yeom S., 2018, P 32 INT C NEUR INF, P4573
[9]  
Yu L, 2004, J MACH LEARN RES, V5, P1205