Missing the missing values: The ugly duckling of fairness in machine learning

被引:31
|
作者
Fernando, Martinez-Plumed [1 ,2 ,3 ]
Cesar, Ferri [2 ]
David, Nieves [2 ]
Jose, Hernandez-Orallo [2 ,3 ]
机构
[1] European Commiss, Joint Res Ctr, Seville, Spain
[2] Univ Politecn Valencia, Valencian Res Inst Artificial Intelligence VRAIN, Valencia, Spain
[3] Univ Cambridge, Leverhulme Ctr Future Intelligence, Cambridge, England
关键词
algorithmic bias; confirmation bias; data imputation; fairness; missing values; sample bias; survey bias; VALIDITY; MODELS; VIEW; BIAS;
D O I
10.1002/int.22415
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, there is an increasing concern in machine learning about the causes underlying unfair decision making, that is, algorithmic decisions discriminating some groups over others, especially with groups that are defined over protected attributes, such as gender, race and nationality. Missing values are one frequent manifestation of all these latent causes: protected groups are more reluctant to give information that could be used against them, sensitive information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we present the first comprehensive analysis of the relation between missing values and algorithmic fairness for machine learning: (1) we analyse the sources of missing data and bias, mapping the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should discourage the consideration of missing values as the uncomfortable ugly data that different techniques and libraries for handling algorithmic bias get rid of at the first occasion, (3) we study the trade-off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods), and (4) we show that the sensitivity of six different machine-learning techniques to missing values is usually low, which reinforces the view that the rows with missing data contribute more to fairness through the other, nonmissing, attributes. We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making.
引用
收藏
页码:3217 / 3258
页数:42
相关论文
共 50 条
  • [1] Missing values handling for machine learning portfolios
    Chen, Andrew Y.
    McCoy, Jack
    JOURNAL OF FINANCIAL ECONOMICS, 2024, 155
  • [2] A Minimal Learning Machine for Datasets with Missing Values
    Paiva Mesquita, Diego P.
    Gomes, Joao Paulo P.
    Souza, Amauri H., Jr.
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 565 - 572
  • [3] Machine Learning with Missing Attributes Values Methods Implementation
    Gallova, Stefania
    Augustin, Michal
    Altahr, Sakena Saied Alsadig
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL II, 2015, : 829 - 834
  • [4] Systematic Review of Using Machine Learning in Imputing Missing Values
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Ani, Zafienas Che
    Jabar, Marzanah A.
    Bukar, Umar Ali
    Devaraj, Navin Kumar
    Muda, Ahmad Sobri
    Tharek, Anas
    Omar, Noritah
    Jaya, M. Izham Mohd
    IEEE ACCESS, 2022, 10 : 44483 - 44502
  • [5] Are missing values important for earnings forecasts? A machine learning perspective
    Uddin, Ajim
    Tao, Xinyuan
    Chou, Chia-Ching
    Yu, Dantong
    QUANTITATIVE FINANCE, 2022, 22 (06) : 1113 - 1132
  • [6] A Novel Approach for Dealing with Missing Values in Machine Learning Datasets with Discrete Values
    Abu-Soud, Saleh M.
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 118 - 122
  • [7] Analyzing the impact of missing values and selection bias on fairness
    Yanchen Wang
    Lisa Singh
    International Journal of Data Science and Analytics, 2021, 12 : 101 - 119
  • [8] Analyzing the impact of missing values and selection bias on fairness
    Wang, Yanchen
    Singh, Lisa
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 12 (02) : 101 - 119
  • [9] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Tolou Shadbahr
    Michael Roberts
    Jan Stanczuk
    Julian Gilbey
    Philip Teare
    Sören Dittmer
    Matthew Thorpe
    Ramon Viñas Torné
    Evis Sala
    Pietro Lió
    Mishal Patel
    Jacobus Preller
    James H. F. Rudd
    Tuomas Mirtti
    Antti Sakari Rannikko
    John A. D. Aston
    Jing Tang
    Carola-Bibiane Schönlieb
    Communications Medicine, 3
  • [10] Predicting Huntington's Disease: Extreme Learning Machine with Missing Values
    Eirola, Emil
    Akusok, Anton
    Bjork, Kaj-Mikael
    Johnson, Hans
    Lendasse, Amaury
    PROCEEDINGS OF ELM-2016, 2018, 9 : 195 - 206