An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets

被引:8
|
作者
Thejas, G. S. [1 ]
Hariprasad, Yashas [2 ]
Iyengar, S. S. [2 ]
Sunitha, N. R. [3 ]
Badrinath, Prajwal [2 ]
Chennupati, Shasank [4 ]
机构
[1] Tarleton State Univ, Texas A&M Univ Syst, Dept Comp Sci & Elect Engn, Stephenville, TX 76401 USA
[2] Florida Int Univ, Knight Fdn Sch Comp & Informat Sci, Discovery Lab, Miami, FL 33199 USA
[3] Siddaganga Inst Technol, Dept Comp Sci & Engn, Tumakuru 572103, Karnataka, India
[4] Univ North Carolina Chapel Hill, Sch Med, Chapel Hill, NC 27599 USA
来源
MACHINE LEARNING WITH APPLICATIONS | 2022年 / 8卷
关键词
Imbalanced data; Oversampling; SMOTE; Noise filter; OVER-SAMPLING TECHNIQUE; DATA SETS; SMOTE; CLASSIFICATION; ALGORITHM; CLASSIFIERS; GENERATION; NOISY;
D O I
10.1016/j.mlwa.2022.100267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
More often than not, data collected in real-time tends to be imbalanced i.e., the samples belonging to a particular class are significantly more than the others. This degrades the performance of the predictor. One of the most notable algorithms to handle such an imbalance in the dataset by fabricating synthetic data, is the "Synthetic Minority Oversampling Technique (SMOTE)". However, data imbalance is not solely responsible for the poor performance of the classifier. Certain research works have demonstrated that noisy samples can have a significant role in misclassifying the dataset. Also, handling large data is computationally expensive. Hence, data reduction is imperative. In this work, we put forth a novel extension of SMOTE by integrating it with the Kalman filter. The proposed method, Kalman-SMOTE (KSMOTE), filters out the noisy samples in the final dataset after SMOTE, which includes both the raw data and the synthetically generated samples, thereby reducing the size of the dataset. Our model is validated with a wide range of datasets. An experimental analysis of the results shows that our model outperforms the presently available techniques.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] LSMOTE: A link-based Synthetic Minority Oversampling Technique for binary imbalanced datasets
    Cai, Qin-Nan
    Zhang, Zhong-Liang
    Wu, Yu-Heng
    Zhang, Xiu-Ming
    NEUROCOMPUTING, 2024, 608
  • [2] A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets
    Kunakorntum, Intouch
    Hinthong, Woranich
    Phunchongharn, Phond
    IEEE ACCESS, 2020, 8 : 114692 - 114704
  • [3] Fuzzy-synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
    Xu, Yanping
    Wu, Chunhua
    Zheng, Kangfeng
    Niu, Xinxin
    Yang, Yixian
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (04):
  • [4] A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data
    Xu, Shoukun
    Li, Zhibang
    Yuan, Baohua
    Yang, Gaochao
    Wang, Xueyuan
    Li, Ning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 367 - 378
  • [5] An improved and random synthetic minority oversampling technique for imbalanced data
    Wei, Guoliang
    Mu, Weimeng
    Song, Yan
    Dou, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [6] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [7] An Adaptive Oversampling Technique for Imbalanced Datasets
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
  • [8] KNNOR: An oversampling technique for imbalanced datasets
    Islam, Ashhadul
    Belhaouari, Samir Brahim
    Rehman, Atiq Ur
    Bensmail, Halima
    APPLIED SOFT COMPUTING, 2022, 115
  • [9] A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification
    Liu, Ruijuan
    APPLIED INTELLIGENCE, 2023, 53 (01) : 786 - 803
  • [10] STB: synthetic minority oversampling technique for tree-boosting models for imbalanced datasets of intrusion detection systems
    Li, Li-Hua
    Ahmad, Ramli
    Tanone, Radius
    Sharma, Alok Kumar
    PEERJ COMPUTER SCIENCE, 2023, 9