An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets

被引:8
|
作者
Thejas, G. S. [1 ]
Hariprasad, Yashas [2 ]
Iyengar, S. S. [2 ]
Sunitha, N. R. [3 ]
Badrinath, Prajwal [2 ]
Chennupati, Shasank [4 ]
机构
[1] Tarleton State Univ, Texas A&M Univ Syst, Dept Comp Sci & Elect Engn, Stephenville, TX 76401 USA
[2] Florida Int Univ, Knight Fdn Sch Comp & Informat Sci, Discovery Lab, Miami, FL 33199 USA
[3] Siddaganga Inst Technol, Dept Comp Sci & Engn, Tumakuru 572103, Karnataka, India
[4] Univ North Carolina Chapel Hill, Sch Med, Chapel Hill, NC 27599 USA
来源
MACHINE LEARNING WITH APPLICATIONS | 2022年 / 8卷
关键词
Imbalanced data; Oversampling; SMOTE; Noise filter; OVER-SAMPLING TECHNIQUE; DATA SETS; SMOTE; CLASSIFICATION; ALGORITHM; CLASSIFIERS; GENERATION; NOISY;
D O I
10.1016/j.mlwa.2022.100267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
More often than not, data collected in real-time tends to be imbalanced i.e., the samples belonging to a particular class are significantly more than the others. This degrades the performance of the predictor. One of the most notable algorithms to handle such an imbalance in the dataset by fabricating synthetic data, is the "Synthetic Minority Oversampling Technique (SMOTE)". However, data imbalance is not solely responsible for the poor performance of the classifier. Certain research works have demonstrated that noisy samples can have a significant role in misclassifying the dataset. Also, handling large data is computationally expensive. Hence, data reduction is imperative. In this work, we put forth a novel extension of SMOTE by integrating it with the Kalman filter. The proposed method, Kalman-SMOTE (KSMOTE), filters out the noisy samples in the final dataset after SMOTE, which includes both the raw data and the synthetically generated samples, thereby reducing the size of the dataset. Our model is validated with a wide range of datasets. An experimental analysis of the results shows that our model outperforms the presently available techniques.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] CMO-SMOTE: Misclassification Cost Minimization Oriented Synthetic Minority Oversampling Technique for Imbalanced Learning
    Zhou, Changsheng
    Liu, Bin
    Wang, Shihai
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2, 2016, : 353 - 358
  • [42] Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media
    Banerjee, Arghasree
    Bhattacharjee, Mayukh
    Ghosh, Kushankur
    Chatterjee, Sankhadeep
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) : 35995 - 36031
  • [43] Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis
    Satriaji, Widi
    Kusumaningrum, Retno
    2018 2ND INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2018, : 99 - 103
  • [44] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [45] TGT: A Novel Adversarial Guided Oversampling Technique for Handling Imbalanced Datasets
    Mahmoud, Ayat
    El-Kilany, Ayman
    Ali, Farid
    Mazen, Sherif
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (04) : 433 - 438
  • [46] Identify essential genes based on clustering based synthetic minority oversampling technique
    Shi, Hua
    Wu, Chenjin
    Bai, Tao
    Chen, Jiahai
    Li, Yan
    Wu, Hao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 153
  • [47] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [48] Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets
    Gonzalez-Cuautle, David
    Hernandez-Suarez, Aldo
    Sanchez-Perez, Gabriel
    Karina Toscano-Medina, Linda
    Portillo-Portillo, Jose
    Olivares-Mercado, Jesus
    Manuel Perez-Meana, Hector
    Lucila Sandoval-Orozco, Ana
    APPLIED SCIENCES-BASEL, 2020, 10 (03):
  • [49] An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
    Wang, Chao-Ran
    Shao, Xin-Hui
    IEEE ACCESS, 2021, 9 : 5069 - 5082
  • [50] A quantum approach to synthetic minority oversampling technique (SMOTE)
    Mohanty, Nishikanta
    Behera, Bikash K.
    Ferrie, Christopher
    Dash, Pravat
    QUANTUM MACHINE INTELLIGENCE, 2025, 7 (01)