Leveraging Feature Bias for Scalable Misprediction Explanation of Machine Learning Models

被引:2
作者
Gesi, Jiri [1 ]
Shen, Xinyun [1 ]
Geng, Yunfan [1 ]
Chen, Qihong [1 ]
Ahmed, Iftekhar [1 ]
机构
[1] Univ Calif Irvine, Donald Bren Sch ICS, Irvine, CA 92717 USA
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年
关键词
machine learning; data imbalance; rule induction; misprediction explanation;
D O I
10.1109/ICSE48619.2023.00135
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Interpreting and debugging machine learning models is necessary to ensure the robustness of the machine learning models. Explaining mispredictions can help significantly in doing so. While recent works on misprediction explanation have proven promising in generating interpretable explanations for mispredictions, the state-of-the-art techniques "blindly" deduce misprediction explanation rules from all data features, which may not be scalable depending on the number of features. To alleviate this problem, we propose an efficient misprediction explanation technique named Bias Guided Misprediction Diagnoser (BGMD), which leverages two prior knowledge about data: a) data often exhibit highly-skewed feature distributions and b) trained models in many cases perform poorly on subdataset with under-represented features. Next, we propose a technique named MAPS (Mispredicted Area UPweight Sampling). MAPS increases the weights of subdataset during model retraining that belong to the group that is prone to be mispredicted because of containing under-represented features. Thus, MAPS make retrained model pay more attention to the under-represented features. Our empirical study shows that our proposed BGMD outperformed the state-of-the-art misprediction diagnoser and reduces diagnosis time by 92%. Furthermore, MAPS outperformed two state-of-the-art techniques on fixing the machine learning model's performance on mispredicted data without compromising performance on all data. All the research artifacts (i.e., tools, scripts, and data) of this study are available in the accompanying website [1].
引用
收藏
页码:1559 / 1570
页数:12
相关论文
共 50 条
  • [1] Explanation of Machine Learning Models Using Improved Shapley Additive Explanation
    Nohara, Yasunobu
    Matsumoto, Koutarou
    Soejima, Hidehisa
    Nakashima, Naoki
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 546 - 546
  • [2] Monotone Functions and Expert Models for Explanation of Machine Learning Models
    Huber, Harlow
    Kovalerchuk, Boris
    2024 28TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION, IV 2024, 2024, : 227 - 235
  • [3] Mitigating Bias in Clinical Machine Learning Models
    Perez-Downes, Julio C.
    Tseng, Andrew S.
    McConn, Keith A.
    Elattar, Sara M.
    Sokumbi, Olayemi
    Sebro, Ronnie A.
    Allyse, Megan A.
    Dangott, Bryan J.
    Carter, Rickey E.
    Adedinsewo, Demilade
    CURRENT TREATMENT OPTIONS IN CARDIOVASCULAR MEDICINE, 2024, 26 (03) : 29 - 45
  • [4] Mitigating Bias in Clinical Machine Learning Models
    Julio C. Perez-Downes
    Andrew S. Tseng
    Keith A. McConn
    Sara M. Elattar
    Olayemi Sokumbi
    Ronnie A. Sebro
    Megan A. Allyse
    Bryan J. Dangott
    Rickey E. Carter
    Demilade Adedinsewo
    Current Treatment Options in Cardiovascular Medicine, 2024, 26 : 29 - 45
  • [5] Leveraging Automated Machine Learning to provide NAFLD screening diagnosis: Proposed machine learning models
    Shah, Ali Haider
    Bangash, Ali Haider
    Fatima, Arshiya
    Zehra, Saiqa
    Abbas, Syed Mohammad Mehmood
    Shah, Syed Mohammad Qasim
    Ashraf, Muhammad
    Ali, Aliya
    Baloch, Adil
    Khan, Ayesha Khalid
    Khawaja, Hashir Fahim
    Ayesha, Noor
    Asghar, Saleha Yurf
    Zahra, Tatheer
    METABOLISM-CLINICAL AND EXPERIMENTAL, 2022, 128 : S10 - S11
  • [6] Ensemble approaches for leveraging machine learning models in load estimation
    Cheung, C.
    Seabrook, E.
    Valdes, J. J.
    Hamaimou, Z. A.
    Biondic, C.
    AERONAUTICAL JOURNAL, 2023, 127 (1318) : 2082 - 2104
  • [7] Leveraging machine learning and optimization models for enhanced seaport efficiency
    Jahangard, Mahdi
    Xie, Ying
    Feng, Yuanjun
    MARITIME ECONOMICS & LOGISTICS, 2025,
  • [8] Statistical quantification of confounding bias in machine learning models
    Spisak, Tamas
    GIGASCIENCE, 2022, 11
  • [9] Bias Discovery in Machine Learning Models for Mental Health
    Mosteiro, Pablo
    Kuiper, Jesse
    Masthoff, Judith
    Scheepers, Floortje
    Spruit, Marco
    INFORMATION, 2022, 13 (05)
  • [10] Process, Bias, and Temperature Scalable CMOS Analog Computing Circuits for Machine Learning
    Kumar, Pratik
    Nandi, Ankita
    Chakrabartty, Shantanu
    Thakur, Chetan Singh
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (01) : 128 - 141