A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification

被引:0
作者
Ping Gong
Junguang Gao
Li Wang
机构
[1] Beijing Technology and Business University,Business School
[2] Beihang University,School of Economics and Management
[3] Beihang University,School of Economics and Management
[4] Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operation,undefined
来源
Journal of Systems Science and Systems Engineering | 2022年 / 31卷
关键词
Imbalance classification; credit classification; class overlap; evolutionary under-sampling; genetic algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Credit risk assessment is an important task of risk management for financial institutions. Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks. However, few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously. To this end, this study proposes a Tomek link and genetic algorithm (GA)-based under-sampling framework (TEUS) to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors. TEUS first determines boundary majority instances with Tomek link, then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance. Second, TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels. After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution, TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set. Innovatively, the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search. Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.
引用
收藏
页码:728 / 752
页数:24
相关论文
共 184 条
[1]  
Alcalá-Fdez J(2011)KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework Journal of Multiple-Valued Logic and Soft Computing 17 255-287
[2]  
Fernández A(2004)A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter 6 20-29
[3]  
Luengo J(2019)Pre-processing approaches for imbalanced distributions in regression Neurocomputing 343 76-99
[4]  
Derra J(2017)DB-MUTE: density-based majority under-sampling technique Knowledge and Information Systems 50 827-850
[5]  
García S(2002)SMOTE: Synthetic Minority Over-sampling Technique Journal of Artificial Intelligence Research 16 321-357
[6]  
Sánchez L(2012)Instance sampling in credit scoring: An empirical study of sample size and balancing International Journal of Forecasting 28 224-238
[7]  
Herrera F(2018)Handling data irregularities in classification: Foundations, trends, and future challenges Pattern Recognition 81 674-693
[8]  
Batista G(2020)Statistical and machine learning models in credit scoring: A systematic literature survey Applied Soft Computing Journal 91 106263-1351
[9]  
Prati R C(2017)Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance Pattern Recognition Letters 93 1339-154
[10]  
Monard M C(2019)Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning Information Sciences 494 141-3471