Improving Credit Risk Prediction in Online Peer-to-Peer (P2P) Lending Using Imbalanced Learning Techniques

被引:16
作者
Boiko Ferreira, Luis Eduardo [1 ]
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
Gomes, Heitor Murilo [2 ]
机构
[1] Pontificia Univ Catolica Parana, Grad Program Informat PPGIa, Curitiba, Parana, Brazil
[2] Univ Paris Saclay, Telecom ParisTech, LTCI, Paris, France
来源
2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017) | 2017年
关键词
CLASSIFICATION;
D O I
10.1109/ICTAI.2017.00037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Peer-to-peer (P2P) lending is a global trend of financial markets that allow individuals to obtain and concede loans without having financial institutions as a strong proxy. As many real-world applications, P2P lending presents an imbalanced characteristic, where the number of creditworthy loan requests is much larger than the number of non-creditworthy ones. In this work, we wrangle a real-world P2P lending data set from Lending Club, containing a large amount of data gathered from 2007 up to 2016. We analyze how supervised classification models and techniques to handle class imbalance impact creditworthiness prediction rates. Ensembles, cost-sensitive and sampling methods are combined and evaluated along logistic regression, decision tree, and bayesian learning schemes. Results show that, in average, sampling techniques outperform ensembles and cost-sensitive approaches.
引用
收藏
页码:175 / 181
页数:7
相关论文
共 35 条
[1]  
[Anonymous], 2014, PEER LENDING RISK PR
[2]  
Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[3]   Synthetic Oversampling for Advanced Radioactive Threat Detection [J].
Bellinger, Colin ;
Japkowicz, Nathalie ;
Drummond, Christopher .
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, :948-953
[4]  
Bramer M, 2016, Principles of Data Mining, P121, DOI [10.1007/978-1-4471-7307-6_9, DOI 10.1007/978-1-4471-7307-6_9]
[5]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]  
Chen C., 2004, USING RANDOM FOREST, V110, P24
[10]  
Choi J.M., 2010, SELECTIVE SAMPLING M