Exploring Evaluation Methods for Interpretable Machine Learning: A Survey

被引:8
|
作者
Alangari, Nourah [1 ]
Menai, Mohamed El Bachir [1 ]
Mathkour, Hassan [1 ]
Almosallam, Ibrahim [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
[2] Saudi Informat Technol Co SITE, Riyadh 12382, Saudi Arabia
关键词
interpretability; explainable AI; evaluating interpretability; BLACK-BOX; RULES; CLASSIFICATION; ACCURACY; ISSUES;
D O I
10.3390/info14080469
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent times, the progress of machine learning has facilitated the development of decision support systems that exhibit predictive accuracy, surpassing human capabilities in certain scenarios. However, this improvement has come at the cost of increased model complexity, rendering them black-box models that obscure their internal logic from users. These black boxes are primarily designed to optimize predictive accuracy, limiting their applicability in critical domains such as medicine, law, and finance, where both accuracy and interpretability are crucial factors for model acceptance. Despite the growing body of research on interpretability, there remains a significant dearth of evaluation methods for the proposed approaches. This survey aims to shed light on various evaluation methods employed in interpreting models. Two primary procedures are prevalent in the literature: qualitative and quantitative evaluations. Qualitative evaluations rely on human assessments, while quantitative evaluations utilize computational metrics. Human evaluation commonly manifests as either researcher intuition or well-designed experiments. However, this approach is susceptible to human biases and fatigue and cannot adequately compare two models. Consequently, there has been a recent decline in the use of human evaluation, with computational metrics gaining prominence as a more rigorous method for comparing and assessing different approaches. These metrics are designed to serve specific goals, such as fidelity, comprehensibility, or stability. The existing metrics often face challenges when scaling or being applied to different types of model outputs and alternative approaches. Another important factor that needs to be addressed is that while evaluating interpretability methods, their results may not always be entirely accurate. For instance, relying on the drop in probability to assess fidelity can be problematic, particularly when facing the challenge of out-of-distribution data. Furthermore, a fundamental challenge in the interpretability domain is the lack of consensus regarding its definition and requirements. This issue is compounded in the evaluation process and becomes particularly apparent when assessing comprehensibility.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] Machine Learning Methods for Credit Card Fraud Detection: A Survey
    Dastidar, Kanishka Ghosh
    Caelen, Olivier
    Granitzer, Michael
    IEEE ACCESS, 2024, 12 : 158939 - 158965
  • [42] Machine learning based methods for software fault prediction: A survey
    Pandey, Sushant Kumar
    Mishra, Ravi Bhushan
    Tripathi, Anil Kumar
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 172
  • [43] Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
    Zhou, Jianlong
    Gandomi, Amir H.
    Chen, Fang
    Holzinger, Andreas
    ELECTRONICS, 2021, 10 (05) : 1 - 19
  • [44] How Interpretable Machine Learning Can Benefit Process Understanding in the Geosciences
    Jiang, Shijie
    Sweet, Lily-belle
    Blougouras, Georgios
    Brenning, Alexander
    Li, Wantong
    Reichstein, Markus
    Denzler, Joachim
    Wei, Shangguan
    Yu, Guo
    Huang, Feini
    Zscheischler, Jakob
    EARTHS FUTURE, 2024, 12 (07)
  • [45] Hessian-based toolbox for reliable and interpretable machine learning in physics
    Dawid, Anna
    Huembeli, Patrick
    Tomza, Michal
    Lewenstein, Maciej
    Dauphin, Alexandre
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2022, 3 (01):
  • [46] Interpretable Machine Learning - A Brief History, State-of-the-Art and Challenges
    Molnar, Christoph
    Casalicchio, Giuseppe
    Bischl, Bernd
    ECML PKDD 2020 WORKSHOPS, 2020, 1323 : 417 - 431
  • [47] Interpretable machine learning for time-to-event prediction in medicine and healthcare
    Baniecki, Hubert
    Sobieski, Bartlomiej
    Szatkowski, Patryk
    Bombinski, Przemyslaw
    Biecek, Przemyslaw
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2025, 159
  • [48] A critical moment in machine learning in medicine: on reproducible and interpretable learning
    Ciobanu-Caraus, Olga
    Aicher, Anatol
    Kernbach, Julius M.
    Regli, Luca
    Serra, Carlo
    Staartjes, Victor E.
    ACTA NEUROCHIRURGICA, 2024, 166 (01)
  • [49] An Empirical Evaluation of Machine Learning Methods for the Insurance Industry
    Dammann, Michael
    Gnoss, Nicolai
    Kunert, Pamela
    Ramcke, Eike-Christian
    Schreier, Tobias
    Steffens, Ulrike
    Zukunft, Olaf
    PROCEEDINGS OF SIXTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICT 2021), VOL 2, 2022, 236 : 933 - 941
  • [50] Performance Evaluation of Machine Learning Methods in Cultural Modeling
    李晓晨
    毛文吉
    曾大军
    苏鹏
    王飞跃
    Journal of Computer Science & Technology, 2009, 24 (06) : 1010 - 1017