Auditing black-box models for indirect influence

被引:117
|
作者
Adler, Philip [1 ]
Falk, Casey [1 ]
Friedler, Sorelle A. [1 ]
Nix, Tionney [1 ]
Rybeck, Gabriel [1 ]
Scheidegger, Carlos [2 ]
Smith, Brandon [1 ]
Venkatasubramanian, Suresh [3 ]
机构
[1] Haverford Coll, Dept Comp Sci, Haverford, PA 19041 USA
[2] Univ Arizona, Dept Comp Sci, Tucson, AZ 85721 USA
[3] Univ Utah, Dept Comp Sci, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会;
关键词
Black-box auditing; ANOVA; Algorithmic accountability; Deep learning; Discrimination-aware data mining; Feature influence; Interpretable machine learning;
D O I
10.1007/s10115-017-1116-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models or asserting that certain problematic attributes (such as race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the data set, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if, for example, the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence such as feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available data sets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures. To further demonstrate the effectiveness of this technique, we use it to audit a black-box recidivism prediction algorithm.
引用
收藏
页码:95 / 122
页数:28
相关论文
共 50 条
  • [1] Auditing black-box models for indirect influence
    Philip Adler
    Casey Falk
    Sorelle A. Friedler
    Tionney Nix
    Gabriel Rybeck
    Carlos Scheidegger
    Brandon Smith
    Suresh Venkatasubramanian
    Knowledge and Information Systems, 2018, 54 : 95 - 122
  • [2] Explaining Black-Box Models for Biomedical Text Classification
    Moradi, Milad
    Samwald, Matthias
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (08) : 3112 - 3120
  • [3] Building Uncertainty Models on Top of Black-Box Predictive APIs
    Brando, Axel
    Torres-Latorre, Clara
    Rodriguez-Serrano, Jose A.
    Vitria, Jordi
    IEEE ACCESS, 2020, 8 : 121344 - 121356
  • [4] Interpretability as Approximation: Understanding Black-Box Models by Decision Boundary
    Dong, Hangcheng
    Liu, Bingguo
    Ye, Dong
    Liu, Guodong
    ELECTRONICS, 2024, 13 (22)
  • [5] Orthogonal Deep Models as Defense Against Black-Box Attacks
    Jalwana, Mohammad A. A. K.
    Akhtar, Naveed
    Bennamoun, Mohammed
    Mian, Ajmal
    IEEE ACCESS, 2020, 8 : 119744 - 119757
  • [6] Understanding the black-box: towards interpretable and reliable deep learning models
    Qamar, Tehreem
    Bawany, Narmeen Zakaria
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [7] Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models
    Welchowski, Thomas
    Maloney, Kelly O.
    Mitchell, Richard
    Schmid, Matthias
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2022, 27 (01) : 175 - 197
  • [8] Targeted Black-Box Adversarial Attack Method for Image Classification Models
    Zheng, Su
    Chen, Jialin
    Wang, Lingli
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [9] Automatic Selection Attacks Framework for Hard Label Black-Box Models
    Liu, Xiaolei
    Li, Xiaoyu
    Zheng, Desheng
    Bai, Jiayu
    Peng, Yu
    Zhang, Shibin
    IEEE INFOCOM 2022 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2022,
  • [10] Hybrid Predictive Models: When an Interpretable Model Collaborates with a Black-box Model
    Wang, Tong
    Lin, Qihang
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22