Investigating the Evolution of Tree Boosting Models with Visual Analytics

被引:10
作者
Wang, Junpeng [1 ]
Zhang, Wei [1 ]
Wang, Liang [1 ]
Yang, Hao [1 ]
机构
[1] Visa Res, Palo Alto, CA 94306 USA
来源
2021 IEEE 14TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2021) | 2021年
关键词
NEURAL-NETWORKS;
D O I
10.1109/PacificVis52677.2021.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor. to push the model's decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix. providing users with an effective interface to track data instances' predictions across the tree boosting process. Also, we present an i mproved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system's effectiveness.
引用
收藏
页码:186 / 195
页数:10
相关论文
共 35 条
[1]   Do Convolutional Neural Networks Learn Class Hierarchy? [J].
Alsallakh, Bilal ;
Jourabloo, Amin ;
Ye, Mao ;
Liu, Xiaoming ;
Ren, Liu .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) :152-162
[2]   Visual Methods for Analyzing Probabilistic Classification Data [J].
Alsallakh, Bilal ;
Hanbury, Allan ;
Hauser, Helwig ;
Miksch, Silvia ;
Rauber, Andreas .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) :1703-1712
[3]   ModelTracker: Redesigning Performance Analysis Tools for Machine Learning [J].
Amershi, Saleema ;
Chickering, Max ;
Drucker, Steven M. ;
Lee, Bongshin ;
Simard, Patrice ;
Suh, Jina .
CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, :337-346
[4]  
Anguita D., 2013, P EUR S ART NEUR NET
[5]  
[Anonymous], 2006, Proc. of the SIGCHI Conf. on Human Factors in Computing Systems
[6]  
Bishop C.M., 2006, Pattern Recognition and Machine Learning, DOI DOI 10.1007/978-0-387-45528-0
[7]   Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables [J].
Blackard, JA ;
Dean, DJ .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) :131-151
[8]  
Breiman L., 2017, Classification and Regression Trees, DOI 10.1201/9781315139470
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232