Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets

被引:0
作者
Pulat, Meryem [1 ]
Kocakoc, Ipek Deveci [2 ]
机构
[1] Firat Univ, Dept Business, TR-23169 Elazig, Turkiye
[2] Dokuz Eylul Univ, Fac Econ & Business Adm, Dept Econometr, Izmir, Turkiye
关键词
machine learning; ensemble learning; classification; feature selection; unbalanced dataset;
D O I
10.37190/ord240410
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The efficacy of machine learning algorithms significantly depends on the adequacy and relevance of features in the data set. Hence, feature selection precedes the classification process. In this study, a hybrid feature selection approach, integrating filter and wrapper methods was employed. This approach not only enhances classification accuracy, surpassing the results achievable with filter methods alone, but also reduces processing time compared to exclusive reliance on wrapper methods. Results indicate a general improvement in algorithm performance with the application of the hybrid feature selection approach. The study utilized the Taiwanese Bankruptcy and Statlog (German Credit Data) datasets from the UCI Machine Learning Repository. These datasets exhibit an unbalanced distribution, necessitating data preprocessing that considers this unbalance. After acknowledging the datasets' unbalanced nature, feature selection and subsequent classification processes were executed.
引用
收藏
页码:157 / 183
页数:27
相关论文
共 42 条
[1]  
ALMEIDA S., 2023, 2023 WORLD C COMM CO, P1
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[4]  
Breiman L., 1984, Classification and regression trees, DOI [10.1201/9781315139470, DOI 10.1201/9781315139470]
[5]   An intelligent bankruptcy prediction model using a multilayer perceptron [J].
Brenes, Raffael Forch ;
Johannssen, Arne ;
Chukhrova, Nataliya .
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2022, 16
[6]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[7]   Feature-Weighted Counterfactual-Based Explanation for Bankruptcy Prediction [J].
Cho, Soo Hyun ;
Shin, Kyung-shik .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
[8]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[9]  
Grubinger T, 2014, J STAT SOFTW, V61, P1
[10]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422