Imbalanced data issues in machine learning classifiers: a case study

被引:1
|
作者
Gong, Mingxing [1 ]
机构
[1] Univ Delaware, Alfred Lerner Coll Business, Inst Financial Serv Analyt, Purnell Hall, Newark, DE 19716 USA
来源
JOURNAL OF OPERATIONAL RISK | 2022年 / 17卷 / 04期
关键词
machine learning; imbalanced data; fraud risk; performance measures; cost sensitive learning; CLASSIFICATION;
D O I
10.21314/JOP.2022.027
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Machine learning classifiers are widely used in financial applications. Due to the nature of certain classification problems, special care should be taken when dealing with imbalanced data. In practice, many model developers and validators fail to take this into account in their model development and validation. In addition, resampling is a common technique to address imbalanced data issues when building traditional logistic regression models. However, there has been no specific discussion regarding the resampling ratio used to rebalance the data or how the issue of imbalance impacts different kinds of machine learning classifiers, especially the more advanced ones. This paper aims to outline the special characteristics of the classifiers, compare different methods in dealing with imbalanced data issues and provide best practice in model development, evaluation and validation to avoid common pitfalls. Although the methods discussed in this paper can apply to general machine learning classifiers in applications with imbalanced data issues, by using a case study in credit card fraud detection this paper calls practitioners' attention to the imbalanced data problems therein, where class imbalance is often mistreated and lacks theoretical discussion.
引用
收藏
页码:17 / 36
页数:20
相关论文
共 50 条
  • [21] Evaluation of the Classifiers in Multiparameter and Imbalanced Data Sets
    Piotrowska, Ewelina
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2019, PT II, 2020, 1051 : 263 - 273
  • [22] Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine
    Chen, Shui-xia
    Wang, Xiao-kang
    Zhang, Hong-yu
    Wang, Jian-qiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
  • [23] An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
    Kokkotis, Christos
    Giarmatzis, Georgios
    Giannakou, Erasmia
    Moustakidis, Serafeim
    Tsatalas, Themistoklis
    Tsiptsios, Dimitrios
    Vadikolias, Konstantinos
    Aggelousis, Nikolaos
    DIAGNOSTICS, 2022, 12 (10)
  • [24] Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
    Akter, Sadia
    Xu, Dong
    Nagel, Susan C.
    Bromfield, John J.
    Pelch, Katherine
    Wilshire, Gilbert B.
    Joshi, Trupti
    FRONTIERS IN GENETICS, 2019, 10
  • [25] Learning Imbalanced Classifiers Locally and Globally with One-Side Probability Machine
    Huang, Kaizhu
    Zhang, Rui
    Yin, Xu-Cheng
    NEURAL PROCESSING LETTERS, 2015, 41 (03) : 311 - 323
  • [26] Learning Imbalanced Classifiers Locally and Globally with One-Side Probability Machine
    Kaizhu Huang
    Rui Zhang
    Xu-Cheng Yin
    Neural Processing Letters, 2015, 41 : 311 - 323
  • [27] Optical Music Recognition as the Case of Imbalanced Pattern Recognition: A Study of Single Classifiers
    Jastrzebska, Agnieszka
    Lesinski, Wojciech
    KNOWLEDGE, INFORMATION AND CREATIVITY SUPPORT SYSTEMS: RECENT TRENDS, ADVANCES AND SOLUTIONS, KICSS 2013, 2016, 364 : 493 - 505
  • [28] Metric Learning from Imbalanced Data
    Gautheron, Leo
    Habrard, Amaury
    Morvant, Emilie
    Sebban, Marc
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 923 - 930
  • [29] Constructing classifiers for imbalanced data using diversity optimisation
    Khorshidi, Hadi A.
    Aickelin, Uwe
    INFORMATION SCIENCES, 2021, 565 : 1 - 16
  • [30] On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling
    Krawczyk, Bartosz
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 180 - 191