Automated Directed Fairness Testing

被引：112

作者：

Udeshi, Sakshi ^{[1
]}

Arora, Pryanshu ^{[2
]}

Chattopadhyay, Sudipta ^{[1
]}

机构：

[1] Singapore Univ Tech & Design, Singapore, Singapore

[2] BITS Pilani, Pilani, Rajasthan, India

来源：

PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18) | 2018年

关键词：

Software Fairness; Directed Testing; Machine Learning;

D O I：

10.1145/3238147.3238165

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Fairness is a critical trait in decision making. As machine-learning models are increasingly being used in sensitive application domains (e.g. education and employment) for decision making, it is crucial that the decisions computed by such models are free of unintended bias. But how can we automatically validate the fairness of arbitrary machine-learning models? For a given machine-learning model and a set of sensitive input parameters, our AEQVITAS approach automatically discovers discriminatory inputs that highlight fairness violation. At the core of AEQVITAS are three novel strategies to employ probabilistic search over the input space with the objective of uncovering fairness violation. Our AEQVITAS approach leverages inherent robustness property in common machine-learning models to design and implement scalable test generation methodologies. An appealing feature of our generated test inputs is that they can be systematically added to the training set of the underlying model and improve its fairness. To this end, we design a fully automated module that guarantees to improve the fairness of the model. We implemented AEQVITAS and we have evaluated it on six state-of-the-art classifiers. Our subjects also include a classifier that was designed with fairness in mind. We show that AEQVITAS effectively generates inputs to uncover fairness violation in all the subject classifiers and systematically improves the fairness of respective models using the generated test inputs. In our evaluation, AEQVITAS generates up to 70% discriminatory inputs (w.r.t. the total number of inputs generated) and leverages these inputs to improve the fairness up to 94%.

引用

页码：98 / 108

页数：11

共 19 条

[1] [Anonymous], 2017, ARXIV170808559
[2] [Anonymous], 2008, KDD
[3] Dua D, 2017, UCI MACHINE LEARNING, DOI DOI 10.1016/J.DSS.2009.05.016
[4] Dwork C., 2012, P 3 INN THEOR COMP S, P214, DOI DOI 10.1145/2090236.2090255
[5] Analysis of classifiers' robustness to adversarial perturbations
Fawzi, Alhussein
Fawzi, Omar
Frossard, Pascal
[J]. MACHINE LEARNING, 2018, 107 (03) : 481 - 508
[6] Certifying and Removing Disparate Impact
Feldman, Michael
Friedler, Sorelle A.
Moeller, John
Scheidegger, Carlos
Venkatasubramanian, Suresh
[J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 259 - 268
[7] Fairness Testing: Testing Software for Discrimination
Galhotra, Sainyam
Brun, Yuriy
Meliou, Alexandra
[J]. ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, : 498 - 510
[8] Goh G, 2016, ADV NEUR IN, V29
[9] Adversarial Examples for Malware Detection
Grosse, Kathrin
Papernot, Nicolas
Manoharan, Praveen
Backes, Michael
McDaniel, Patrick
[J]. COMPUTER SECURITY - ESORICS 2017, PT II, 2017, 10493 : 62 - 79
[10] The Unreasonable Effectiveness of Data
Halevy, Alon
Norvig, Peter
Pereira, Fernando
[J]. IEEE INTELLIGENT SYSTEMS, 2009, 24 (02) : 8 - 12

← 1 2 →