Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

被引:1
|
作者
Duan, Lilong [1 ,2 ]
Xue, Wei [1 ,2 ]
Huang, Jun [1 ,2 ]
Zheng, Xiao [1 ,2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
来源
TSINGHUA SCIENCE AND TECHNOLOGY | 2024年 / 29卷 / 01期
关键词
Clustering algorithms; Filtering algorithms; Benchmark testing; Sampling methods; Information filters; Cleaning; Classification algorithms; imbalanced data classification; oversampling; noise filtering; clustering; OVERSAMPLING TECHNIQUE; SMOTE; PREDICTION;
D O I
10.26599/TST.2023.9010006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of imbalanced data classification learning has received much attention. Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples. Majority weighted minority oversampling technique (MWMOTE) is an effective approach to solve this problem, however, it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data. To this end, we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering (SPMSC) to solve these problems. Firstly, in order to effectively eliminate the effect of noisy samples, SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample. Note that MWMOTE may generate duplicate samples, we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples. Finally, data cleaning is performed on the processed data to further eliminate class overlap. Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.
引用
收藏
页码:216 / 231
页数:16
相关论文
共 50 条
  • [41] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    APPLIED SOFT COMPUTING, 2021, 101
  • [42] A clustering and generative adversarial networks-based hybrid approach for imbalanced data classification
    Ding H.
    Cui X.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 8003 - 8018
  • [43] DYCUSBoost: Adaboost-based imbalanced learning using dynamic clustering and undersampling
    Chen, Lingchi
    Deng, Xiaoheng
    Shen, Hailan
    Zhu, Congxu
    Chang, Le
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 208 - 215
  • [44] A Neural Learning-Based Clustering Model for Collaborative Filtering
    Mika, Grzegorz P.
    Dziczkowski, Grzegorz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT I, 2018, 11055 : 219 - 227
  • [45] A Deep Learning Based Printing Defect Classification Method with Imbalanced Samples
    Zhang, Erhu
    Li, Bo
    Li, Peilin
    Chen, Yajun
    SYMMETRY-BASEL, 2019, 11 (12):
  • [46] Hybrid sampling-based contrastive learning for imbalanced node classification
    Cui, Caixia
    Wang, Jie
    Wei, Wei
    Liang, Jiye
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 989 - 1001
  • [47] A New Approach for Imbalanced Data Classification Based on Minimize Loss Learning
    Zhang, Chunkai
    Wang, Guoquan
    Zhou, Ying
    Jiang, Jiayao
    2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 82 - 87
  • [48] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Chen, Zhi
    Lin, Tao
    Xia, Xin
    Xu, Hongyan
    Ding, Sha
    APPLIED INTELLIGENCE, 2018, 48 (08) : 2441 - 2457
  • [49] Radar group target recognition based on HRRPs and weighted mean shift clustering
    Guo Pengcheng
    Liu Zheng
    Wang Jingjing
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2020, 31 (06) : 1152 - 1159
  • [50] Learning Discriminative Embedding for Hyperspectral Image Clustering Based on Set-to-Set and Sample-to-Sample Distances
    Qin, Yao
    Bruzzone, Lorenzo
    Li, Biao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (01): : 473 - 485