DeforestVis: Behaviour Analysis of Machine Learning Models with Surrogate Decision Stumps

被引:1
作者
Chatzimparmpas, Angelos [1 ]
Martins, Rafeal M. [2 ]
Telea, Alexandru C. [3 ]
Kerren, Andreas [2 ,4 ]
机构
[1] Northwestern Univ, Dept Comp Sci, Evanston, IL 60208 USA
[2] Linnaeus Univ, Dept Comp Sci & Media Technol, Vaxjo, Sweden
[3] Univ Utrecht, Dept Informat & Comp Sci, Utrecht, Netherlands
[4] Linkoping Univ, Dept Sci & Technol, Norrkoping, Sweden
关键词
surrogate model; model understanding; adaptive boosting; machine learning; visual analytics; visualization; VISUAL ANALYTICS; EXTRACTION; SELECTION;
D O I
10.1111/cgf.15004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As the complexity of machine learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model-agnostic, way to interpret such models is to train surrogate models-such as rule sets and decision trees-that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal-providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity versus fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analysing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers. We present DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing AdaBoost-based surrogate decision stumps. Our proposed tool helps users explore the complexity versus fidelity trade-off, create attribute-based explanations with weighted stumps, and analyse the impact of rule overriding. image
引用
收藏
页数:19
相关论文
共 96 条
[1]  
Agarwal Rishabh, 2021, Advances in Neural Information Processing Systems, V34
[2]  
Ankerst M., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P179, DOI 10.1145/347090.347124
[3]  
[Anonymous], 2023, DeforestVis Code
[4]  
[Anonymous], 2011, D3-Data-driven documents
[5]  
[Anonymous], 2014, Vue.js-The Progressive JavaScript Framework
[6]   Visualizing Rule-based Classifiers for Clinical Risk Prognosis [J].
Antweiler, Dario ;
Fuchs, Georg .
2022 IEEE VISUALIZATION CONFERENCE - SHORT PAPERS (VIS), 2022, :55-59
[7]   Case study: Visualization for decision tree analysis in data mining [J].
Barlow, T ;
Neville, P .
IEEE SYMPOSIUM ON INFORMATION VISUALIZATION 2001, PROCEEDINGS, 2001, :149-152
[8]  
Behrisch M, 2014, IEEE CONF VIS ANAL, P43, DOI 10.1109/VAST.2014.7042480
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
Bremm S., 2011, 2011 IEEE Conference on Visual Analytics Science and Technology, P31, DOI 10.1109/VAST.2011.6102439