MI-MOTE: Multiple imputation-based minority oversampling technique for imbalanced and incomplete data classification

被引:20
|
作者
Shin, Kyoham [1 ]
Han, Jongmin [1 ]
Kang, Seokho [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, 2066 Seobu Ro, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Class imbalance; Data incompleteness; Oversampling; Missing value imputation; Multiple imputation; MISSING DATA; SMOTE;
D O I
10.1016/j.ins.2021.06.043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance and data incompleteness problems occur simultaneously in many real world classification datasets, which negatively affects the training of classifiers. Given an imbalanced and incomplete training dataset, the conventional approach is to address these two problems sequentially by handling data incompleteness first and then focusing on class imbalance. In this study, we propose a multiple imputation-based minority oversampling technique, named MI-MOTE, to address imbalanced and incomplete data classification simultaneously. Majority instances are imputed once and minority instances are oversampled using multiple different imputations without directly manipulating any of their observed values. Accordingly, minority instances are diversified with less data distortion compared to the conventional approach. The proposed method is applied in the data preprocessing phase, meaning it can be used with any type of classifier. Experimental results for benchmark datasets with various missing rates demonstrate the effectiveness of the proposed method. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:80 / 89
页数:10
相关论文
共 50 条
  • [1] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [2] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [3] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [4] A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data
    Xu, Shoukun
    Li, Zhibang
    Yuan, Baohua
    Yang, Gaochao
    Wang, Xueyuan
    Li, Ning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 367 - 378
  • [5] An Imputation-Based Method for Fuzzy Clustering of Incomplete Data
    Soni, S.
    Sharma, I.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 616 - 621
  • [6] A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique
    Mustafa, Nadir
    Memon, Raheel A.
    Li, Jian-Ping
    Omer, Mohammed Z.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2017, 8 (01) : 61 - 67
  • [7] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [8] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [9] Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning
    Liu, Gencheng
    Yang, Youlong
    Li, Benchong
    KNOWLEDGE-BASED SYSTEMS, 2018, 158 : 154 - 174
  • [10] An improved and random synthetic minority oversampling technique for imbalanced data
    Wei, Guoliang
    Mu, Weimeng
    Song, Yan
    Dou, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 248