Improvements in PD models. A case-study approach

被引:0
作者
Caplescu, Raluca Dana [1 ]
Cojocea, Manuela-Simona [2 ]
Pele, Daniel Traian [1 ]
Strat, Vasile Alecsandru [1 ]
机构
[1] Bucharest Univ Econ Studies, Bucharest, Romania
[2] Univ Bucharest, Bucharest, Romania
来源
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS EXCELLENCE | 2021年 / 15卷 / 01期
关键词
PD models; outlier treatment; impact analysis; P2P lending; Lending Club;
D O I
10.2478/picbe-2021-0004
中图分类号
F [经济];
学科分类号
02 ;
摘要
Models for estimating the probability of default are widely used in the business throughout the lending process, starting as early as the application stage, where they play an important role in loan approval status. For model soundness and performance, ensuring adequate data quality is essential. Identifying outliers, analyzing their impact and choosing the right method to treat them is a necessary stage of preprocessing, which is often overlooked in practice for a variety of reasons, an important one being insufficient data. Given the inherent imbalance of the loan portfolio with regard to default status, elimination of outliers is seldom feasible. The current widely accepted approach is based on binning and weight of evidence. Usually two types of binning are tested, namely bucket and quantile. While the latter is robust to outlier presence, the former is not. Both approaches lead to the discretization of the continuous variable they are applied on. This causes information loss both in terms of variation given by individual values and in terms of distance between the various observation points on a certain variable. In the present paper, we explore the opportunity of using other methods for dealing with outlier presence and we describe their advantages and disadvantages in the context of probability of default estimation for credit risk. We conclude that, aside from quantile binning, not dealing with outliers in case of very large datasets or winsorizing are also effective. More importantly, several methods should be considered and tested for each variable in order to find the optimal balance between altering the data and reducing variance.
引用
收藏
页码:13 / 32
页数:20
相关论文
共 25 条
  • [1] Abdelmoula A.K., 2015, J ACCOUNT MANAG INF, V14, P79
  • [2] Akcura Korhan, 2018, DESIGN COMP DATA MIN
  • [3] Bayraci S., 2019, Theoretical and Applied Economics, V4, P75
  • [4] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [5] Comparison of Data Mining Classification Algorithms Determining the Default Risk
    Cigsar, Begum
    Unal, Deniz
    [J]. SCIENTIFIC PROGRAMMING, 2019, 2019
  • [6] PREDICTIVE MODELS FOR LOAN DEFAULT RISK ASSESSMENT
    Coser, Alexandru
    Maer-matei, Monica Mihaela
    Albu, Crisan
    [J]. ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2019, 53 (02) : 149 - 165
  • [7] OUTLIER-PRONE AND OUTLIER-RESISTANT DISTRIBUTIONS
    GREEN, RF
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1976, 71 (354) : 502 - 505
  • [8] COMPARISON OF CREDIT SCORING MODELS ON PROBABILITY OF DEFAULT ESTIMATION FOR US BANKS
    Gurny, Petr
    Gurny, Martin
    [J]. PRAGUE ECONOMIC PAPERS, 2013, 22 (02): : 163 - 181
  • [9] Dematerialization of banking products and services in the digital era
    Hadad, Shahrazad
    Bratianu, Constantin
    [J]. MANAGEMENT & MARKETING, 2019, 14 (03) : 318 - 337
  • [10] Hardle W., 2007, Discussion Paper. Series 2: Banking and Financial Studies No 18/2007