Flexible loss functions for binary classification in gradient-boosted decision trees: An application to credit scoring

被引:10
作者
Mushava, Jonah [1 ]
Murray, Michael [1 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Westville Campus,Private Bag X54001, ZA-4000 Durban, South Africa
关键词
Class imbalance; Machine learning; Credit scoring; XGBoost; Freddie Mac; BANKRUPTCY PREDICTION; ENSEMBLE; PERFORMANCE; CHALLENGES; MODEL; RISK;
D O I
10.1016/j.eswa.2023.121876
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces new flexible loss functions for binary classification in Gradient-Boosted Decision Trees (GBDT) that combine Dice-based and cross-entropy-based losses and offer link functions from either a generalized extreme value (GEV) or exponentiated exponential logistic (EEL) distribution. Testing 27 different GBDT models using XGBoost on a Freddie Mac mortgage loan database showed that the choice of the loss function is useful. Specifically, when the class imbalance ratio (IR) is less than 99, using a skewed GEV distribution-based link function in XGBoost enhances discriminatory power and classification accuracy while retaining a simple model structure, which is particularly important in credit scoring applications. In cases where class imbalances are severe, typically between IRs of 99 and 200, we found that an advanced loss function, which is composed of a symmetric hybrid loss function and a link derived from a positively skewed EEL distribution, outperforms other XGBoost variants. Based on our findings, the accuracy improvements of these proposed extensions result in lower misclassification costs, which are especially evident when IR is below 99, which results in higher profitability for the business. Furthermore, the study highlights the transparency associated with GBDT, which is also an integral component of financial applications. Researchers and practitioners can use these insights to create more accurate and discriminative machine learning models, with possible extensions to other GBDT implementations and machine learning techniques that take into account loss functions. The source code for the proposed approach is publicly available at https://github.com/jm-ml/flexible-losses-for-binary-classification-with-GBDT.
引用
收藏
页数:16
相关论文
共 10 条
  • [1] APPLICATION OF DECISION TREES IN CREDIT SCORING
    Kvesic, Ljiljanka
    EKONOMSKI VJESNIK, 2013, 26 (02): : 382 - 391
  • [2] GRADIENT BOOSTED DECISION TREES FOR LITHOLOGY CLASSIFICATION
    Dev, Vikrant A.
    Eden, Mario R.
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON FOUNDATIONS OF COMPUTER-AIDED PROCESS DESIGN, 2019, 47 : 113 - 118
  • [3] Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees
    Brophy, Jonathan
    Hammoudeh, Zayd
    Lowd, Daniel
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [4] Estimation of optimum thresholds for binary classification using genetic algorithm: An application to solve a credit scoring problem
    Kazemi, Hamid Reza
    Khalili-Damghani, Kaveh
    Sadi-Nezhad, Soheil
    EXPERT SYSTEMS, 2023, 40 (03)
  • [5] Credit scoring based on tree-enhanced gradient boosting decision trees
    Liu, Wanan
    Fan, Hong
    Xia, Meng
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [6] Formation lithology classification using scalable gradient boosted decision trees
    Dev, Vikrant A.
    Eden, Mario R.
    COMPUTERS & CHEMICAL ENGINEERING, 2019, 128 : 392 - 404
  • [7] Performance analysis of Bayesian optimised gradient-boosted decision trees for digital elevation model (DEM) error correction: interim results
    Okolie, Chukwuma
    Adeleke, Adedayo
    Smit, Julian
    Mills, Jon
    Ogbeta, Caleb
    Maduako, Iyke
    ISPRS ANNALS OF THE PHOTOGRAMMETRY, REMOTE SENSING AND SPATIAL INFORMATION SCIENCES: VOLUME X-2-2024, 2024, : 179 - 183
  • [8] Automated system for Brain Tumour Detection and Classification using eXtreme Gradient Boosted Decision Trees
    Mudgal, Tushar Kant
    Jain, Siddhant
    Gupta, Aditya
    Gusain, Kunal
    2017 INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS (ICSOFTCOMP), 2017,
  • [9] Step-wise multi-grained augmented gradient boosting decision trees for credit scoring
    Liu, Wanan
    Fan, Hong
    Xia, Min
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 97
  • [10] Deep learning meets decision trees: An application of a heterogeneous deep forest approach in credit scoring for online consumer lending
    Xia, Yufei
    Guo, Xinyi
    Li, Yinguo
    He, Lingyun
    Chen, Xueyuan
    JOURNAL OF FORECASTING, 2022, 41 (08) : 1669 - 1690