Explaining Mispredictions of Machine Learning Models using Rule Induction

被引:16
作者
Cito, Juergen [1 ,2 ]
Dillig, Isil [3 ,4 ]
Kim, Seohyun [4 ]
Murali, Vijayaraghavan [4 ]
Chandra, Satish [4 ]
机构
[1] TU Wien, Vienna, Austria
[2] Facebook, Vienna, Austria
[3] UT Austin, Austin, TX USA
[4] Facebook, Menlo Pk, CA USA
来源
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) | 2021年
关键词
explainability; rule induction; machine learning;
D O I
10.1145/3468264.3468614
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
While machine learning (ML) models play an increasingly prevalent role in many software engineering tasks, their prediction accuracy is often problematic. When these models do mispredict, it can be very difficult to isolate the cause. In this paper, we propose a technique that aims to facilitate the debugging process of trained statistical models. Given an ML model and a labeled data set, our method produces an interpretable characterization of the data on which the model performs particularly poorly. The output of our technique can be useful for understanding limitations of the training data or the model itself; it can also be useful for ensembling if there are multiple models with different strengths. We evaluate our approach through case studies and illustrate how it can be used to improve the accuracy of predictive models used for software engineering tasks within Facebook. We also compare our algorithm against related rule induction techniques to illustrate its advantages in the context of explaining mispredictions of machine learning models.
引用
收藏
页码:716 / 727
页数:12
相关论文
共 47 条
[1]   Ownership at Large [J].
Ahlgren, John ;
Berezin, Maria Eugenia ;
Bojarczuk, Kinga ;
Dulskyte, Elena ;
Dvortsova, Inna ;
George, Johann ;
Gucevska, Natalija ;
Harman, Mark ;
He, Shan ;
Lammel, Ralf ;
Meijer, Erik ;
Sapora, Silvia ;
Spahr-Summers, Justin .
2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, :406-410
[2]  
[Anonymous], 2011, Comparative study of data mining classification methods in cardiovascular disease prediction
[3]  
[Anonymous], 1999, ACM SIGKDD INT C KNO
[4]  
[Anonymous], 1996, AAAI spring symposium on machine learning in information access
[5]   Subgroup discovery [J].
Atzmueller, Martin .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 5 (01) :35-49
[6]  
Aye G., 2021, LEARNING AUTOCOMPLET, P131, DOI DOI 10.1109/ICSE-SEIP52600.2021.00022
[7]  
Balog Matej, 2016, INT C LEARN REPR
[8]   AutoPandas: Neural-Backed Generators for Program Synthesis [J].
Bavishi, Rohan ;
Lemieux, Caroline ;
Fox, Roy ;
Sen, Koushik ;
Stoica, Ion .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA)
[9]   When Deep Learning Met Code Search [J].
Cambronero, Jose ;
Li, Hongyu ;
Kim, Seohyun ;
Sen, Koushik ;
Chandra, Satish .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :964-974
[10]   Automatically Analyzing Groups of Crashes for Finding Correlations [J].
Castelluccio, Marco ;
Sansone, Carlo ;
Verdoliva, Luisa ;
Poggi, Giovanni .
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, :717-726