Explaining Deep Learning Models with Constrained Adversarial Examples

被引：24

作者：

Moore, Jonathan ^{[1
]}

Hammerla, Nils ^{[1
]}

Watkins, Chris ^{[2
]}

机构：

[1] Babylon Hlth, London SW3 3DD, England

[2] Royal Holloway Univ London, Egham, Surrey, England

来源：

PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I | 2019年 / 11670卷

关键词：

Explainable AI; Adversarial examples; Counerfactual explanations; INTERPRETABILITY;

D O I：

10.1007/978-3-030-29908-8_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning algorithms generally suffer from a problem of explainability. Given a classification result from a model, it is typically hard to determine what caused the decision to be made, and to give an informative explanation. We explore a new method of generating counterfactual explanations, which instead of explaining why a particular classification was made explain how a different outcome can be achieved. This gives the recipients of the explanation a better way to understand the outcome, and provides an actionable suggestion. We show that the introduced method of Constrained Adversarial Examples (CADEX) can be used in real world applications, and yields explanations which incorporate business or domain constraints such as handling categorical attributes and range constraints.

引用

页码：43 / 56

页数：14

共 15 条

[1]

[Anonymous], 2017, Agonistic Mourning: Political Dissidence and the Women in Black

[2]

Brown, 2017, ADVERSARIAL PATCH

[3]

Dua D., 2017, Uci machine learning repository

[4]

Goodfellow I J, 2015, P INT C LEARN REPR I

[5]

Kingma DP, 2014, ARXIV

[6]

Kurakin A., 2016, Adversarial examples in the physical world

[7] Comparison-Based Inverse Classification for Interpretability in Machine Learning [J].

Laugel, Thibault ;

Lesot, Marie-Jeanne ;

Marsala, Christophe ;

Renard, Xavier ;

Detyniecki, Marcin .

INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, IPMU 2018, PT I, 2018, 853 :100-111

[8]

Lundberg SM, 2017, ADV NEUR IN, V30

[9] Explanation in artificial intelligence: Insights from the social sciences [J].

Miller, Tim .

ARTIFICIAL INTELLIGENCE, 2019, 267 :1-38

[10] Practical Black-Box Attacks against Machine Learning [J].

Papernot, Nicolas ;

McDaniel, Patrick ;

Goodfellow, Ian ;

Jha, Somesh ;

Celik, Z. Berkay ;

Swami, Ananthram .

PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, :506-519

← 1 2 →