An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets

被引:8
|
作者
Thejas, G. S. [1 ]
Hariprasad, Yashas [2 ]
Iyengar, S. S. [2 ]
Sunitha, N. R. [3 ]
Badrinath, Prajwal [2 ]
Chennupati, Shasank [4 ]
机构
[1] Tarleton State Univ, Texas A&M Univ Syst, Dept Comp Sci & Elect Engn, Stephenville, TX 76401 USA
[2] Florida Int Univ, Knight Fdn Sch Comp & Informat Sci, Discovery Lab, Miami, FL 33199 USA
[3] Siddaganga Inst Technol, Dept Comp Sci & Engn, Tumakuru 572103, Karnataka, India
[4] Univ North Carolina Chapel Hill, Sch Med, Chapel Hill, NC 27599 USA
来源
MACHINE LEARNING WITH APPLICATIONS | 2022年 / 8卷
关键词
Imbalanced data; Oversampling; SMOTE; Noise filter; OVER-SAMPLING TECHNIQUE; DATA SETS; SMOTE; CLASSIFICATION; ALGORITHM; CLASSIFIERS; GENERATION; NOISY;
D O I
10.1016/j.mlwa.2022.100267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
More often than not, data collected in real-time tends to be imbalanced i.e., the samples belonging to a particular class are significantly more than the others. This degrades the performance of the predictor. One of the most notable algorithms to handle such an imbalance in the dataset by fabricating synthetic data, is the "Synthetic Minority Oversampling Technique (SMOTE)". However, data imbalance is not solely responsible for the poor performance of the classifier. Certain research works have demonstrated that noisy samples can have a significant role in misclassifying the dataset. Also, handling large data is computationally expensive. Hence, data reduction is imperative. In this work, we put forth a novel extension of SMOTE by integrating it with the Kalman filter. The proposed method, Kalman-SMOTE (KSMOTE), filters out the noisy samples in the final dataset after SMOTE, which includes both the raw data and the synthetically generated samples, thereby reducing the size of the dataset. Our model is validated with a wide range of datasets. An experimental analysis of the results shows that our model outperforms the presently available techniques.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification
    Ruijuan Liu
    Applied Intelligence, 2023, 53 : 786 - 803
  • [12] A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 735 - +
  • [13] Performance of Synthetic Minority Oversampling Technique on Imbalanced Breast Cancer Data
    Rani, K. Usha
    Ramadevi, G. Naga
    Lavanya, D.
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1623 - 1627
  • [14] KNNGAN: an oversampling technique for textual imbalanced datasets
    Mirmorsal Madani
    Homayun Motameni
    Hosein Mohamadi
    The Journal of Supercomputing, 2023, 79 : 5291 - 5326
  • [15] Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application
    Yi, Huaikuan
    Jiang, Qingchao
    Yan, Xuefeng
    Wang, Bei
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (09) : 5867 - 5875
  • [16] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [17] KNNGAN: an oversampling technique for textual imbalanced datasets
    Madani, Mirmorsal
    Motameni, Homayun
    Mohamadi, Hosein
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (05): : 5291 - 5326
  • [18] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [19] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [20] Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets
    Lin, Chin-Teng
    Hsieh, Tsung-Yu
    Liu, Yu-Ting
    Lin, Yang-Yin
    Fang, Chieh-Ning
    Wang, Yu-Kai
    Yen, Gary
    Pal, Nikhil R.
    Chuang, Chun-Hsiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (05) : 950 - 962