New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning

被引:17
作者
Muslim, Much Aziz [1 ,2 ]
Nikmah, Tiara Lailatul [2 ]
Pertiwi, Dwika Ananda Agustina [2 ]
Subhan [2 ]
Jumanto [2 ]
Dasril, Yosza [1 ]
Iswanto [3 ]
机构
[1] Univ Tun Hussein Onn Malaysia, Fac Technol Management, Johor Baharu 86400, Malaysia
[2] Univ Negeri Semarang, Dept Comp Sci, Semarang 50229, Indonesia
[3] Univ Muhammadiyah Yogyakarta, Dept Elect Engn, Bantul 55183, Indonesia
来源
INTELLIGENT SYSTEMS WITH APPLICATIONS | 2023年 / 18卷
关键词
LightGBM; P2P lending; Default risk prediction; Stacking ensemble learning; Improve accuracy prediction; HETEROGENEOUS ENSEMBLE; DEFAULT RISK; CLASSIFICATION; LIGHTGBM; DISTANCE; BUSINESS; LOANS;
D O I
10.1016/j.iswa.2023.200204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Peer-to-peer (P2P) Lending is a type of financial innovation that offers loans without intermediaries to individuals and companies. In the P2P lending system, there is a risk of default on the loan which causes the company to lose. Many studies have to reduce the risk of default by developing a classification model of prediction of default that focuses on increasing accuracy. However, the big problem with prediction is data imbalance and low performance classification algorithms. The purpose of this study is to improve the accuracy of default risk prediction by balancing the data and combining the stacking model ensemble with the meta-learner. The proposed new model consists of 3 optimization parts, the first is Synthetic Minority Oversampling Technique (SMOTE), the second is the selection of features and the third is stacking ensemble learning. The SMOTE method is used to balance the data, the feature selection LightGBM and stacking ensemble learning (LGBFS-StackingXGBoost) to optimize machine learning accuracy. A new model of stacking ensemble learning by combining three base-learner algorithms namely KNN, SVM and Random Forest into the XGBoost meta-learner algorithm. The model was tested using two datasets, namely the online P2P lending dataset and the lending club loan data analysis dataset. The evaluation results show that LGBFS-StackingXGBoost is the best model for both datasets. In the online P2P lending dataset, it received an accuracy of 99,982% and in the lending club loan data analysis dataset, it received an accuracy of 91,434%. This study shows that the accuracy of the prediction model can be improved using the LGBFS-StackingXGBoost method.
引用
收藏
页数:8
相关论文
共 69 条
[1]   A new nested ensemble technique for automated diagnosis of breast cancer [J].
Abdar, Moloud ;
Zomorodi-Moghadam, Mariam ;
Zhou, Xujuan ;
Gururajan, Raj ;
Tao, Xiaohui ;
Barua, Prabal D. ;
Gururajan, Rashmi .
PATTERN RECOGNITION LETTERS, 2020, 132 :123-131
[2]   Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System [J].
Al-Asadi, Mustafa A. ;
Tasdemir, Sakir .
IEEE ACCESS, 2021, 9 :149266-149286
[3]   Identifying jitter outliers in single fiber electromyography: Comparison of four methods [J].
Anagnostou, Evangelos ;
Dimopoulou, Panagiota ;
Sklavos, Sokratis ;
Zouvelou, Vasiliki ;
Zambelis, Thomas .
MUSCLE & NERVE, 2021, 63 (02) :217-224
[4]  
Ashari IA., 2016, SCI J INFORM, V3, P149, DOI DOI 10.15294/SJI.V3I2.7911
[5]   A Comparison of Credit Rating Classification Models Based on Spark-Evidence from Lending-club [J].
Bai Ruyu ;
Hai Mo ;
Li Haifeng .
7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2019): INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT BASED ON ARTIFICIAL INTELLIGENCE, 2019, 162 :811-818
[6]   Data-driven optimization of peer-to-peer lending portfolios based on the expected value framework [J].
Byanjankar, Ajay ;
Mezei, Jozsef ;
Heikkila, Markku .
INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2021, 28 (02) :119-129
[7]   A data-driven machine learning approach to predicting stacking faulting energy in austenitic steels [J].
Chaudhary, N. ;
Abu-Odeh, A. ;
Karaman, I. ;
Arroyave, R. .
JOURNAL OF MATERIALS SCIENCE, 2017, 52 (18) :11048-11076
[8]   Predicting Default Risk on Peer-to-Peer Lending Imbalanced Datasets [J].
Chen, Yen-Ru ;
Leu, Jenq-Shiou ;
Huang, Sheng-An ;
Wang, Jui-Tang ;
Takada, Jun-Ichi .
IEEE ACCESS, 2021, 9 :73103-73109
[9]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[10]  
Damayanti D. R., 2022, Journal of Soft Computing Exploration, V3, P62, DOI [10.52465/joscex.v3i1.69, DOI 10.52465/JOSCEX.V3I1.69]