Investigating the Evolution of Tree Boosting Models with Visual Analytics

被引:10
作者
Wang, Junpeng [1 ]
Zhang, Wei [1 ]
Wang, Liang [1 ]
Yang, Hao [1 ]
机构
[1] Visa Res, Palo Alto, CA 94306 USA
来源
2021 IEEE 14TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2021) | 2021年
关键词
NEURAL-NETWORKS;
D O I
10.1109/PacificVis52677.2021.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tree boosting models are widely adopted predictive models and have demonstrated superior performance than other conventional and even deep learning models, especially since the recent release of their parallel and distributed implementations, e.g., XGBoost, LightGMB, and CatBoost. Tree boosting uses a group of sequentially generated weak learners (i.e., decision trees), each learns from the mistakes of its predecessor. to push the model's decision boundary towards the true boundary. As the number of trees keeps increasing over training, it is important to reveal how the newly-added trees change the predictions of individual data instances, and how the impacts of different data features evolve. To accomplish these goals, in this paper, we introduce a new design of the temporal confusion matrix. providing users with an effective interface to track data instances' predictions across the tree boosting process. Also, we present an i mproved visualization to better illustrate and compare the impacts of individual data features (based on their SHAP values) across training iterations. Integrating these components with a tree structure visualization component, we propose a visual analytics system for tree boosting models. Through case studies with domain experts using real-world datasets, we validated the system's effectiveness.
引用
收藏
页码:186 / 195
页数:10
相关论文
共 35 条
  • [1] Do Convolutional Neural Networks Learn Class Hierarchy?
    Alsallakh, Bilal
    Jourabloo, Amin
    Ye, Mao
    Liu, Xiaoming
    Ren, Liu
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 152 - 162
  • [2] Visual Methods for Analyzing Probabilistic Classification Data
    Alsallakh, Bilal
    Hanbury, Allan
    Hauser, Helwig
    Miksch, Silvia
    Rauber, Andreas
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 1703 - 1712
  • [3] ModelTracker: Redesigning Performance Analysis Tools for Machine Learning
    Amershi, Saleema
    Chickering, Max
    Drucker, Steven M.
    Lee, Bongshin
    Simard, Patrice
    Suh, Jina
    [J]. CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, : 337 - 346
  • [4] Anguita D., 2013, ESANN, P437, DOI DOI 10.3390/S20082200
  • [5] Bishop C.M., 2006, Pattern Recognition and Machine Learning
  • [6] Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables
    Blackard, JA
    Dean, DJ
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) : 131 - 151
  • [7] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [8] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [9] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [10] Stochastic gradient boosting
    Friedman, JH
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) : 367 - 378