MI-MOTE: Multiple imputation-based minority oversampling technique for imbalanced and incomplete data classification

被引:20
|
作者
Shin, Kyoham [1 ]
Han, Jongmin [1 ]
Kang, Seokho [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, 2066 Seobu Ro, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Class imbalance; Data incompleteness; Oversampling; Missing value imputation; Multiple imputation; MISSING DATA; SMOTE;
D O I
10.1016/j.ins.2021.06.043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance and data incompleteness problems occur simultaneously in many real world classification datasets, which negatively affects the training of classifiers. Given an imbalanced and incomplete training dataset, the conventional approach is to address these two problems sequentially by handling data incompleteness first and then focusing on class imbalance. In this study, we propose a multiple imputation-based minority oversampling technique, named MI-MOTE, to address imbalanced and incomplete data classification simultaneously. Majority instances are imputed once and minority instances are oversampled using multiple different imputations without directly manipulating any of their observed values. Accordingly, minority instances are diversified with less data distortion compared to the conventional approach. The proposed method is applied in the data preprocessing phase, meaning it can be used with any type of classifier. Experimental results for benchmark datasets with various missing rates demonstrate the effectiveness of the proposed method. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:80 / 89
页数:10
相关论文
共 50 条
  • [31] An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets
    Thejas, G. S.
    Hariprasad, Yashas
    Iyengar, S. S.
    Sunitha, N. R.
    Badrinath, Prajwal
    Chennupati, Shasank
    MACHINE LEARNING WITH APPLICATIONS, 2022, 8
  • [32] Oversampling the minority class in a multi-linear feature space for imbalanced data classification
    Liang, Peifeng
    Li, Weite
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2018, 13 (10) : 1483 - 1491
  • [33] Synthetic minority oversampling technique based on natural neighborhood graph with subgraph cores for class-imbalanced classification
    Zhao, Ming
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [34] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [35] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [36] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [37] Distance-based arranging oversampling technique for imbalanced data
    Dai, Qi
    Liu, Jian-wei
    Zhao, Jia-Liang
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1323 - 1342
  • [38] Distance-based arranging oversampling technique for imbalanced data
    Qi Dai
    Jian-wei Liu
    Jia-Liang Zhao
    Neural Computing and Applications, 2023, 35 : 1323 - 1342
  • [39] A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets
    Carrig, Madeline M.
    Manrique-Vallier, Daniel
    Ranby, Krista W.
    Reiter, Jerome P.
    Hoyle, Rick H.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (04) : 383 - 397
  • [40] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167