Leveraging Feature Bias for Scalable Misprediction Explanation of Machine Learning Models

被引：2

作者：

Gesi, Jiri ^{[1
]}

Shen, Xinyun ^{[1
]}

Geng, Yunfan ^{[1
]}

Chen, Qihong ^{[1
]}

Ahmed, Iftekhar ^{[1
]}

机构：

[1] Univ Calif Irvine, Donald Bren Sch ICS, Irvine, CA 92717 USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年

关键词：

machine learning; data imbalance; rule induction; misprediction explanation;

D O I：

10.1109/ICSE48619.2023.00135

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Interpreting and debugging machine learning models is necessary to ensure the robustness of the machine learning models. Explaining mispredictions can help significantly in doing so. While recent works on misprediction explanation have proven promising in generating interpretable explanations for mispredictions, the state-of-the-art techniques "blindly" deduce misprediction explanation rules from all data features, which may not be scalable depending on the number of features. To alleviate this problem, we propose an efficient misprediction explanation technique named Bias Guided Misprediction Diagnoser (BGMD), which leverages two prior knowledge about data: a) data often exhibit highly-skewed feature distributions and b) trained models in many cases perform poorly on subdataset with under-represented features. Next, we propose a technique named MAPS (Mispredicted Area UPweight Sampling). MAPS increases the weights of subdataset during model retraining that belong to the group that is prone to be mispredicted because of containing under-represented features. Thus, MAPS make retrained model pay more attention to the under-represented features. Our empirical study shows that our proposed BGMD outperformed the state-of-the-art misprediction diagnoser and reduces diagnosis time by 92%. Furthermore, MAPS outperformed two state-of-the-art techniques on fixing the machine learning model's performance on mispredicted data without compromising performance on all data. All the research artifacts (i.e., tools, scripts, and data) of this study are available in the accompanying website [1].

引用

页码：1559 / 1570

页数：12

共 50 条

[1] Explanation of Machine Learning Models Using Improved Shapley Additive Explanation
Nohara, Yasunobu
Matsumoto, Koutarou
Soejima, Hidehisa
Nakashima, Naoki
ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 546 - 546
[2] Monotone Functions and Expert Models for Explanation of Machine Learning Models
Huber, Harlow
Kovalerchuk, Boris
2024 28TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION, IV 2024, 2024, : 227 - 235
[3] Mitigating Bias in Clinical Machine Learning Models
Perez-Downes, Julio C.
Tseng, Andrew S.
McConn, Keith A.
Elattar, Sara M.
Sokumbi, Olayemi
Sebro, Ronnie A.
Allyse, Megan A.
Dangott, Bryan J.
Carter, Rickey E.
Adedinsewo, Demilade
CURRENT TREATMENT OPTIONS IN CARDIOVASCULAR MEDICINE, 2024, 26 (03) : 29 - 45
[4] Mitigating Bias in Clinical Machine Learning Models
Julio C. Perez-Downes
Andrew S. Tseng
Keith A. McConn
Sara M. Elattar
Olayemi Sokumbi
Ronnie A. Sebro
Megan A. Allyse
Bryan J. Dangott
Rickey E. Carter
Demilade Adedinsewo
Current Treatment Options in Cardiovascular Medicine, 2024, 26 : 29 - 45
[5] Leveraging Automated Machine Learning to provide NAFLD screening diagnosis: Proposed machine learning models
Shah, Ali Haider
Bangash, Ali Haider
Fatima, Arshiya
Zehra, Saiqa
Abbas, Syed Mohammad Mehmood
Shah, Syed Mohammad Qasim
Ashraf, Muhammad
Ali, Aliya
Baloch, Adil
Khan, Ayesha Khalid
Khawaja, Hashir Fahim
Ayesha, Noor
Asghar, Saleha Yurf
Zahra, Tatheer
METABOLISM-CLINICAL AND EXPERIMENTAL, 2022, 128 : S10 - S11
[6] Ensemble approaches for leveraging machine learning models in load estimation
Cheung, C.
Seabrook, E.
Valdes, J. J.
Hamaimou, Z. A.
Biondic, C.
AERONAUTICAL JOURNAL, 2023, 127 (1318) : 2082 - 2104
[7] Leveraging machine learning and optimization models for enhanced seaport efficiency
Jahangard, Mahdi
Xie, Ying
Feng, Yuanjun
MARITIME ECONOMICS & LOGISTICS, 2025,
[8] Statistical quantification of confounding bias in machine learning models
Spisak, Tamas
GIGASCIENCE, 2022, 11
[9] Bias Discovery in Machine Learning Models for Mental Health
Mosteiro, Pablo
Kuiper, Jesse
Masthoff, Judith
Scheepers, Floortje
Spruit, Marco
INFORMATION, 2022, 13 (05)
[10] Process, Bias, and Temperature Scalable CMOS Analog Computing Circuits for Machine Learning
Kumar, Pratik
Nandi, Ankita
Chakrabartty, Shantanu
Thakur, Chetan Singh
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (01) : 128 - 141

← 1 2 3 4 5 →