Imbalanced data issues in machine learning classifiers: a case study

被引:1
|
作者
Gong, Mingxing [1 ]
机构
[1] Univ Delaware, Alfred Lerner Coll Business, Inst Financial Serv Analyt, Purnell Hall, Newark, DE 19716 USA
来源
JOURNAL OF OPERATIONAL RISK | 2022年 / 17卷 / 04期
关键词
machine learning; imbalanced data; fraud risk; performance measures; cost sensitive learning; CLASSIFICATION;
D O I
10.21314/JOP.2022.027
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Machine learning classifiers are widely used in financial applications. Due to the nature of certain classification problems, special care should be taken when dealing with imbalanced data. In practice, many model developers and validators fail to take this into account in their model development and validation. In addition, resampling is a common technique to address imbalanced data issues when building traditional logistic regression models. However, there has been no specific discussion regarding the resampling ratio used to rebalance the data or how the issue of imbalance impacts different kinds of machine learning classifiers, especially the more advanced ones. This paper aims to outline the special characteristics of the classifiers, compare different methods in dealing with imbalanced data issues and provide best practice in model development, evaluation and validation to avoid common pitfalls. Although the methods discussed in this paper can apply to general machine learning classifiers in applications with imbalanced data issues, by using a case study in credit card fraud detection this paper calls practitioners' attention to the imbalanced data problems therein, where class imbalance is often mistreated and lacks theoretical discussion.
引用
收藏
页码:17 / 36
页数:20
相关论文
共 50 条
  • [31] An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
    Garcia, Vicente
    Sanchez, Jose
    Mollineda, Ramon
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 397 - +
  • [32] Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data
    Nugroho, Hary
    Wikantika, Ketut
    Bijaksana, Satria
    Saepuloh, Asep
    OPEN GEOSCIENCES, 2023, 15 (01)
  • [33] Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection
    Luo, Menghua
    Wang, Ke
    Cai, Zhiping
    Liu, Anfeng
    Li, Yangyang
    Cheang, Chak Fong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 58 (01): : 15 - 26
  • [34] Machine learning classifiers in glaucoma
    Bowd, Christopher
    Goldbaum, Michael H.
    OPTOMETRY AND VISION SCIENCE, 2008, 85 (06) : 396 - 405
  • [35] A Performance Analysis of Classifiers on Imbalanced Data
    Garcia, Nathan F.
    Strzoda, Romulo A.
    Lucca, Giancarlo
    Borges, Eduardo N.
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 602 - 609
  • [36] Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review
    Saleh, Dhekre Saber
    Othman, Mohd Shahizan
    BAGHDAD SCIENCE JOURNAL, 2024, 21 (03) : 1119 - 1136
  • [37] Copying Machine Learning Classifiers
    Unceta, Irene
    Nin, Jordi
    Pujol, Oriol
    IEEE ACCESS, 2020, 8 (08) : 160268 - 160284
  • [38] Comparing the performance of meta-classifiers-a case study on selected imbalanced data sets relevant for prediction of liver toxicity
    Jain, Sankalp
    Kotsampasakou, Eleni
    Ecker, Gerhard F.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2018, 32 (05) : 583 - 590
  • [39] Integrating Data Selection and Extreme Learning Machine for Imbalanced Data
    Mahdiyah, Umi
    Irawan, M. Isa
    Imah, Elly Matul
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2015), 2015, 59 : 221 - 229
  • [40] Classification of Imbalanced Immunotherapy and Health-Related Data Utilising Novel Machine Learning Experiments
    Mahmoud, Ahsanullah Yunas
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2022, 2024, 1454 : 158 - 169