Explaining Deep Learning Models with Constrained Adversarial Examples

被引:22
作者
Moore, Jonathan [1 ]
Hammerla, Nils [1 ]
Watkins, Chris [2 ]
机构
[1] Babylon Hlth, London SW3 3DD, England
[2] Royal Holloway Univ London, Egham, Surrey, England
来源
PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I | 2019年 / 11670卷
关键词
Explainable AI; Adversarial examples; Counerfactual explanations; INTERPRETABILITY;
D O I
10.1007/978-3-030-29908-8_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning algorithms generally suffer from a problem of explainability. Given a classification result from a model, it is typically hard to determine what caused the decision to be made, and to give an informative explanation. We explore a new method of generating counterfactual explanations, which instead of explaining why a particular classification was made explain how a different outcome can be achieved. This gives the recipients of the explanation a better way to understand the outcome, and provides an actionable suggestion. We show that the introduced method of Constrained Adversarial Examples (CADEX) can be used in real world applications, and yields explanations which incorporate business or domain constraints such as handling categorical attributes and range constraints.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 15 条
  • [1] [Anonymous], 2017, Agonistic Mourning: Political Dissidence and the Women in Black
  • [2] Brown, 2017, ADVERSARIAL PATCH
  • [3] Dua D., 2017, Uci machine learning repository
  • [4] Goodfellow I J, 2015, P INT C LEARN REPR I
  • [5] Kingma DP, 2014, ARXIV
  • [6] Kurakin A., 2016, Adversarial examples in the physical world
  • [7] Comparison-Based Inverse Classification for Interpretability in Machine Learning
    Laugel, Thibault
    Lesot, Marie-Jeanne
    Marsala, Christophe
    Renard, Xavier
    Detyniecki, Marcin
    [J]. INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, IPMU 2018, PT I, 2018, 853 : 100 - 111
  • [8] Lundberg SM, 2017, ADV NEUR IN, V30
  • [9] Explanation in artificial intelligence: Insights from the social sciences
    Miller, Tim
    [J]. ARTIFICIAL INTELLIGENCE, 2019, 267 : 1 - 38
  • [10] Practical Black-Box Attacks against Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Goodfellow, Ian
    Jha, Somesh
    Celik, Z. Berkay
    Swami, Ananthram
    [J]. PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 506 - 519