Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

被引:1
|
作者
Duan, Lilong [1 ,2 ]
Xue, Wei [1 ,2 ]
Huang, Jun [1 ,2 ]
Zheng, Xiao [1 ,2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
来源
TSINGHUA SCIENCE AND TECHNOLOGY | 2024年 / 29卷 / 01期
关键词
Clustering algorithms; Filtering algorithms; Benchmark testing; Sampling methods; Information filters; Cleaning; Classification algorithms; imbalanced data classification; oversampling; noise filtering; clustering; OVERSAMPLING TECHNIQUE; SMOTE; PREDICTION;
D O I
10.26599/TST.2023.9010006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of imbalanced data classification learning has received much attention. Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples. Majority weighted minority oversampling technique (MWMOTE) is an effective approach to solve this problem, however, it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data. To this end, we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering (SPMSC) to solve these problems. Firstly, in order to effectively eliminate the effect of noisy samples, SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample. Note that MWMOTE may generate duplicate samples, we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples. Finally, data cleaning is performed on the processed data to further eliminate class overlap. Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.
引用
收藏
页码:216 / 231
页数:16
相关论文
共 50 条
  • [31] OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification
    Junnan Li
    Qingsheng Zhu
    Applied Intelligence, 2023, 53 : 30987 - 31017
  • [32] Clustering-based Binary-class Classification for Imbalanced Data Sets
    Chen, Chao
    Shyu, Mei-Ling
    2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 384 - 389
  • [33] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [34] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [35] An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling
    Li, Fan
    Wang, Bo
    Wang, Pin
    Jiang, Mingfeng
    Li, Yongming
    APPLIED INTELLIGENCE, 2023, 53 (18) : 21167 - 21191
  • [36] IMBALANCED DATA CLASSIFICATION BASED ON EXTREME LEARNING MACHINE AUTOENCODER
    Shen, Chu
    Zhang, Su-Fang
    Zhai, Jun-Hal
    Luo, Ding-Sheng
    Chen, Jun-Fen
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2018, : 399 - 404
  • [37] A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis
    Liu, Hui
    Liu, Zhenyu
    Jia, Weiqiang
    Zhang, Donghao
    Tan, Jianrong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (03) : 1583 - 1593
  • [38] mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework
    Xia, Shuyin
    Chen, Baiyun
    Wang, Guoyin
    Zheng, Yong
    Gao, Xinbo
    Giem, Elisabeth
    Chen, Zizhong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) : 2916 - 2930
  • [39] Multigraph Random Walk for Joint Learning of Multiview Clustering and Semisupervised Classification
    Wang, Shiping
    Fu, Lele
    Wang, Zhewen
    Xu, Haiping
    Zhu, William
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (03) : 926 - 939
  • [40] ClusterCNN: Clustering-Based Feature Learning for Hyperspectral Image Classification
    Yao, Wei
    Lian, Cheng
    Bruzzone, Lorenzo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 1991 - 1995