Data Pre-Processing for Discrimination Prevention: Information-Theoretic Optimization and Analysis

被引:19
作者
Calmon, Flavio du Pin [1 ]
Wei, Dennis [2 ]
Vinzamuri, Bhanukiran [2 ]
Ramamurthy, Karthikeyan Natesan [2 ]
Varshney, Kush R. [2 ]
机构
[1] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
[2] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
Machine learning; ethics; optimization;
D O I
10.1109/JSTSP.2018.2865887
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling group discrimination, limiting distortion in individual data samples, and preserving utility. Several theoretical properties are established, including conditions for convexity, a characterization of the impact of limited sample size on discrimination and utility guarantees, and a connection between discrimination and estimation. Two instances of the proposed optimization are applied to datasets, including one on real-world criminal recidivism. Results show that discrimination can be greatly reduced at a small cost in classification accuracy and with precise control of individual distortion.
引用
收藏
页码:1106 / 1119
页数:14
相关论文
共 50 条
[21]   How Does Distribution Matching Help Domain Generalization: An Information-Theoretic Analysis [J].
Dong, Yuxin ;
Gong, Tieliang ;
Chen, Hong ;
Song, Shuangyong ;
Zhang, Weizhan ;
Li, Chen .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (03) :2028-2053
[22]   Optimizing Machine Learning Data Pre-Processing for Financial Fraud Detection [J].
Bower, Matthew ;
Godasu, Rajesh ;
Nyakundi, Nicholas ;
Reynolds, Shawn .
2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, :28-37
[23]   Drought Forecasting: A Review and Assessment of the Hybrid Techniques and Data Pre-Processing [J].
Alawsi, Mustafa A. ;
Zubaidi, Salah L. ;
Al-Bdairi, Nabeel Saleem Saad ;
Al-Ansari, Nadhir ;
Hashim, Khalid .
HYDROLOGY, 2022, 9 (07)
[24]   Set-Based Pre-Processing for Points-To Analysis [J].
Smaragdakis, Yannis ;
Balatsouras, George ;
Kastrinis, George .
ACM SIGPLAN NOTICES, 2013, 48 (10) :253-269
[25]   Information-Theoretic Ensemble Feature Selection With Multi-Stage Aggregation for Sensor Array Optimization [J].
Wijaya, Dedy Rahman ;
Afianti, Farah .
IEEE SENSORS JOURNAL, 2021, 21 (01) :476-489
[26]   EEG Feature Selection in Emotion Recognition Using a Fuzzy Information-Theoretic Based Optimization Approach [J].
Zhang, Jia ;
Liu, Siwei ;
Wu, Hanrui ;
Zhang, Zhe ;
Long, Jinyi .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2025, 33 (08) :2675-2688
[27]   Evolutionary Multi-objective Optimization of Business Process Designs with Pre-processing [J].
Georgoulakos, Kostas ;
Tsakalidis, George ;
Vergidis, Kostas ;
Samaras, Nikolaos .
2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, :897-904
[28]   Data-based quality analysis in machining production: Influence of data pre-processing on the results of machine learning models [J].
Ziegenbein, Amina ;
Metternich, Joachim .
54TH CIRP CONFERENCE ON MANUFACTURING SYSTEMS 2021-TOWARDS DIGITALIZED MANUFACTURING 4.0, CMS 2021, 2021, 104 :869-874
[29]   An automatic generation of pre-processing strategy combined with machine learning multivariate analysis for NIR spectral data [J].
Arianti, Nunik Destria ;
Saputra, Edo ;
Sitorus, Agustami .
JOURNAL OF AGRICULTURE AND FOOD RESEARCH, 2023, 13
[30]   Exploring the Steps of Infrared (IR) Spectral Analysis: Pre-Processing, (Classical) Data Modelling, and Deep Learning [J].
Mokari, Azadeh ;
Guo, Shuxia ;
Bocklitz, Thomas .
MOLECULES, 2023, 28 (19)