Exploring Evaluation Methods for Interpretable Machine Learning: A Survey

被引:8
|
作者
Alangari, Nourah [1 ]
Menai, Mohamed El Bachir [1 ]
Mathkour, Hassan [1 ]
Almosallam, Ibrahim [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
[2] Saudi Informat Technol Co SITE, Riyadh 12382, Saudi Arabia
关键词
interpretability; explainable AI; evaluating interpretability; BLACK-BOX; RULES; CLASSIFICATION; ACCURACY; ISSUES;
D O I
10.3390/info14080469
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent times, the progress of machine learning has facilitated the development of decision support systems that exhibit predictive accuracy, surpassing human capabilities in certain scenarios. However, this improvement has come at the cost of increased model complexity, rendering them black-box models that obscure their internal logic from users. These black boxes are primarily designed to optimize predictive accuracy, limiting their applicability in critical domains such as medicine, law, and finance, where both accuracy and interpretability are crucial factors for model acceptance. Despite the growing body of research on interpretability, there remains a significant dearth of evaluation methods for the proposed approaches. This survey aims to shed light on various evaluation methods employed in interpreting models. Two primary procedures are prevalent in the literature: qualitative and quantitative evaluations. Qualitative evaluations rely on human assessments, while quantitative evaluations utilize computational metrics. Human evaluation commonly manifests as either researcher intuition or well-designed experiments. However, this approach is susceptible to human biases and fatigue and cannot adequately compare two models. Consequently, there has been a recent decline in the use of human evaluation, with computational metrics gaining prominence as a more rigorous method for comparing and assessing different approaches. These metrics are designed to serve specific goals, such as fidelity, comprehensibility, or stability. The existing metrics often face challenges when scaling or being applied to different types of model outputs and alternative approaches. Another important factor that needs to be addressed is that while evaluating interpretability methods, their results may not always be entirely accurate. For instance, relying on the drop in probability to assess fidelity can be problematic, particularly when facing the challenge of out-of-distribution data. Furthermore, a fundamental challenge in the interpretability domain is the lack of consensus regarding its definition and requirements. This issue is compounded in the evaluation process and becomes particularly apparent when assessing comprehensibility.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Interpretable machine learning for weather and climate prediction: A review
    Yang, Ruyi
    Hu, Jingyu
    Li, Zihao
    Mu, Jianli
    Yu, Tingzhao
    Xia, Jiangjiang
    Li, Xuhong
    Dasgupta, Aritra
    Xiong, Haoyi
    ATMOSPHERIC ENVIRONMENT, 2024, 338
  • [22] Machine Learning and Deep Learning for Loan Prediction in Banking: Exploring Ensemble Methods and Data Balancing
    Sayed, Eslam Hussein
    Alabrah, Amerah
    Rahouma, Kamel Hussein
    Zohaib, Muhammad
    Badry, Rasha M.
    IEEE ACCESS, 2024, 12 : 193997 - 194019
  • [23] Survey and Evaluation of Hypertension Machine Learning Research
    du Toit, Clea
    Tran, Tran Quoc Bao
    Deo, Neha
    Aryal, Sachin
    Lip, Stefanie
    Sykes, Robert
    Manandhar, Ishan
    Sionakidis, Aristeidis
    Stevenson, Leah
    Pattnaik, Harsha
    Alsanosi, Safaa
    Kassi, Maria
    Le, Ngoc
    Rostron, Maggie
    Nichol, Sarah
    Aman, Alisha
    Nawaz, Faisal
    Mehta, Dhruven
    Tummala, Ramakumar
    McCallum, Linsay
    Reddy, Sandeep
    Visweswaran, Shyam
    Kashyap, Rahul
    Joe, Bina
    Padmanabhan, Sandosh
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2023, 12 (09):
  • [24] Interpretable machine learning for materials design
    James Dean
    Matthias Scheffler
    Thomas A. R. Purcell
    Sergey V. Barabash
    Rahul Bhowmik
    Timur Bazhirov
    Journal of Materials Research, 2023, 38 : 4477 - 4496
  • [25] Interpretable machine learning for materials design
    Dean, James
    Scheffler, Matthias
    Purcell, Thomas A. R.
    Barabash, Sergey V.
    Bhowmik, Rahul
    Bazhirov, Timur
    JOURNAL OF MATERIALS RESEARCH, 2023, 38 (20) : 4477 - 4496
  • [26] Study on Interpretable Surrogate Model for Power System Stability Evaluation Machine Learning
    Han T.
    Chen J.
    Li Y.
    He G.
    Li H.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2020, 40 (13): : 4122 - 4130
  • [27] Crime Prediction Methods Based on Machine Learning: A Survey
    Yin, Junxiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4601 - 4629
  • [28] Thy-Wise: An interpretable machine learning model for the evaluation of thyroid nodules
    Jin, Zhe
    Pei, Shufang
    Ouyang, Lizhu
    Zhang, Lu
    Mo, Xiaokai
    Chen, Qiuying
    You, Jingjing
    Chen, Luyan
    Zhang, Bin
    Zhang, Shuixing
    INTERNATIONAL JOURNAL OF CANCER, 2022, 151 (12) : 2229 - 2243
  • [29] Survey of Machine Learning Methods for Big Data Applications
    Vinothini, A.
    Priya, S. Baghavathi
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [30] Interpretable machine learning: Fundamental principles and 10 grand challenges
    Rudin, Cynthia
    Chen, Chaofan
    Chen, Zhi
    Huang, Haiyang
    Semenova, Lesia
    Zhong, Chudi
    STATISTICS SURVEYS, 2022, 16 : 1 - 85