Information gain directed genetic algorithm wrapper feature selection for credit rating

被引:216
|
作者
Jadhav, Swati [1 ]
He, Hongmei [1 ]
Jenkins, Karl [1 ]
机构
[1] Cranfield Univ, Sch Aerosp Transport & Mfg, Cranfield MK43 0AL, Beds, England
关键词
Feature selection; Genetic algorithm in wrapper; Support vector machine; K nearest neighbour clustering; Naive Bayes classifier; Information gain; Credit scoring; Accuracy; ROC curve; SUPPORT VECTOR MACHINES; SWARM OPTIMIZATION; CLASSIFICATION; HYBRID; COMBINATION; MODEL; SVM; SET;
D O I
10.1016/j.asoc.2018.04.033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Financial credit scoring is one of the crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. "Curse of Dimensionality" is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top in features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naive Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making. Regarding the classification accuracy, SVM accuracy is always better than KNN and NB for Baseline techniques, GAW and IGDFS. Also, we can conclude that the IGDFS achieved better performance than generic GAW, and GAW obtained better performance than the corresponding single classifiers (baseline) for almost all cases, except for the German Credit dataset, IGDFS + KNN has worse performance than generic GAW and the single classifier KNN. Removing features with low information gain could produce conflict with the original data structure for KNN, and thus affect the performance of IGDFS + KNN. Regarding the ROC performance, for the German Credit Dataset, the three classic machine learning algorithms, SVM, KNN and Naive Bayes in the wrapper of IGDFS GA obtained almost the same performance. For the Australian credit dataset and the Taiwan Credit dataset, the IGDFS + Naive Bayes achieved the largest area under ROC curves. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:541 / 553
页数:13
相关论文
共 50 条
  • [21] Optimized Approach of Feature Selection based on Information Gain
    Wu, Guohua
    Xu, Junjun
    2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 157 - 161
  • [22] Feature Selection Algorithm for High-dimensional Biomedical Data Using Information Gain and Improved Chemical Reaction Optimization
    Zhang, Ge
    Yu, Pan
    Wang, Jianlin
    Yan, Chaokun
    CURRENT BIOINFORMATICS, 2020, 15 (08) : 912 - 926
  • [23] BAT algorithm based feature selection: Application in credit scoring
    Tripathi, Diwakar
    Reddy, B. Ramachandra
    Reddy, Y. C. A. Padmanabha
    Shukla, Alok Kumar
    Kumar, Ravi Kant
    Sharma, Neeraj Kumar
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5561 - 5570
  • [24] Information Gain with Chaotic Genetic Algorithm for Gene Selection and Classification Problem
    Yang, Cheng-San
    Chuang, Li-Yeh
    Li, Jung-Chike
    Yang, Cheng-Hong
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1127 - +
  • [25] Wrapper Feature Selection based on Genetic Algorithm for Recognizing Objects from Satellite Imagery
    Hewahi, Nabil M.
    Alashqar, Eyad A.
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2015, 8 (03) : 1 - 20
  • [26] A Novel Wrapper-Based Optimization Algorithm for the Feature Selection and Classification
    Talpur, Noureen
    Abdulkadir, Said Jadid
    Hasan, Mohd Hilmi
    Alhussian, Hitham
    Alwadain, Ayed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5799 - 5820
  • [27] A novel hybrid genetic algorithm with granular information for feature selection and optimization
    Dong, Hongbin
    Li, Tao
    Ding, Rui
    Sun, Jing
    APPLIED SOFT COMPUTING, 2018, 65 : 33 - 46
  • [28] Feature Selection with a Binary Flamingo Search Algorithm and a Genetic Algorithm
    Eluri, Rama Krishna
    Devarakonda, Nagaraju
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (17) : 26679 - 26730
  • [29] A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm
    Xue, Xiaowei
    Yao, Min
    Wu, Zhaohui
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) : 389 - 412
  • [30] A Filter Feature Selection Algorithm Based on Mutual Information for Intrusion Detection
    Zhao, Fei
    Zhao, Jiyong
    Niu, Xinxin
    Luo, Shoushan
    Xin, Yang
    APPLIED SCIENCES-BASEL, 2018, 8 (09):