Minimizing the Overlapping Degree to Improve Class-Imbalanced Learning Under Sparse Feature Selection: Application to Fraud Detection

被引:40
作者
Fatima, El Barakaz [1 ]
Omar, Boutkhoum [1 ]
Abdelmajid, El Moutaouakkil [1 ]
Rustam, Furqan [2 ]
Mehmood, Arif [3 ]
Choi, Gyu Sang [4 ]
机构
[1] Chouaib Doukkali Univ, Fac Sci, LAROSERI Lab, El Jadida 24000, Morocco
[2] Khwaja Fareed Univ Engn & Informat Technol KFUEIT, Dept Comp Sci, Rahim Yar Khan 64200, Pakistan
[3] Islamia Univ Bahawalpur, Dept Comp Sci & Informat Technol, Punjab 63100, Pakistan
[4] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
基金
新加坡国家研究基金会;
关键词
Feature extraction; Measurement; Training; Support vector machines; Predictive models; Prediction algorithms; Data models; Augmented R-value; class-imbalance; feature selection; fraud detection; overlapping; CLASSIFICATION; CLASSIFIERS; SYSTEM; CHALLENGES; ALGORITHM; SMOTE;
D O I
10.1109/ACCESS.2021.3056285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the classification of class-imbalanced data has obtained increasing attention across different scientific areas such as fraud detection, metabolomics, Cancer diagnosis, etc. This interest comes after proving the negative effect of overlapping on the performance of class-imbalanced learning. Based on augmented R-value, our proposed strategy aims to select features that make data achieve the minimal overlap degree, so improving the performance of classification as well. In this context, we present three feature selection algorithms RONS (Reduce Overlapping with No-sampling), ROS (Reduce Overlapping with SMOTE), and ROA (Reduce Overlapping with ADASYN), which are built through sparse feature selection to minimize the overlapping and perform binary classification. Also, a re-sampling process has been included in both ROS and ROA. Simulation results show that our proposed algorithms as feature selection methods manage the variation of false discovery rate during the selection of main features for the process modeling. For the experiment, four credit card datasets have been selected to test the performance of our algorithms. Using F-measure and Gmean evaluation metrics, the results reveal that our proposed algorithms are considerably recommended compared with classical feature selection methods. Besides, this effective feature selection strategy can be extended as an alternative to deal with class-imbalanced learning problems that involve overlapping.
引用
收藏
页码:28101 / 28110
页数:10
相关论文
共 56 条
[1]   B2B E-Commerce Institutionalization in SMEs in Less Developed Countries: A Model and Instrument [J].
Ali, Almaaf Bader A. ;
Miao, Jian-Jun ;
Quang-Dung Tran .
INTERNATIONAL JOURNAL OF E-ADOPTION, 2013, 5 (04) :1-21
[2]   Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm [J].
Ambusaidi, Mohammed A. ;
He, Xiangjian ;
Nanda, Priyadarsi ;
Tan, Zhiyuan .
IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (10) :2986-2998
[3]   Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study [J].
Amin, Adnan ;
Anwar, Sajid ;
Adnan, Awais ;
Nawaz, Muhammad ;
Howard, Newton ;
Qadir, Junaid ;
Hawalah, Ahmad ;
Hussain, Amir .
IEEE ACCESS, 2016, 4 :7940-7957
[4]  
[Anonymous], 2013, INT J ENG RES TECHNO
[5]  
[Anonymous], 2014, B POLYTECH I JASSY F
[6]   Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [J].
Aridas, Christos K. ;
Karlos, Stamatis ;
Kanas, Vasileios G. ;
Fazakis, Nikos ;
Kotsiantis, Sotiris B. .
IEEE ACCESS, 2020, 8 :2122-2133
[7]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[8]   Dealing with overlap and imbalance: a new metric and approach [J].
Borsos, Zalan ;
Lemnaru, Camelia ;
Potolea, Rodica .
PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (02) :381-395
[9]  
Brabazon A., 2010, Evolutionary Computation (CEC), 2010 IEEE Congress on, P1
[10]   Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data [J].
Castro, Cristiano L. ;
Braga, Antonio P. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) :888-899