Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

被引:1
|
作者
Duan, Lilong [1 ,2 ]
Xue, Wei [1 ,2 ]
Huang, Jun [1 ,2 ]
Zheng, Xiao [1 ,2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
来源
TSINGHUA SCIENCE AND TECHNOLOGY | 2024年 / 29卷 / 01期
关键词
Clustering algorithms; Filtering algorithms; Benchmark testing; Sampling methods; Information filters; Cleaning; Classification algorithms; imbalanced data classification; oversampling; noise filtering; clustering; OVERSAMPLING TECHNIQUE; SMOTE; PREDICTION;
D O I
10.26599/TST.2023.9010006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of imbalanced data classification learning has received much attention. Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples. Majority weighted minority oversampling technique (MWMOTE) is an effective approach to solve this problem, however, it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data. To this end, we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering (SPMSC) to solve these problems. Firstly, in order to effectively eliminate the effect of noisy samples, SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample. Note that MWMOTE may generate duplicate samples, we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples. Finally, data cleaning is performed on the processed data to further eliminate class overlap. Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.
引用
收藏
页码:216 / 231
页数:16
相关论文
共 50 条
  • [21] Sample Selection based Active Learning for Imbalanced Data
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, : 645 - 651
  • [22] An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data
    Lee, Dohyun
    Kim, Kyoungok
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
  • [23] An overlapping oriented imbalanced ensemble learning algorithm with weighted projection clustering grouping and consistent fuzzy sample transformation
    Li, Fan
    Wang, Bo
    Shen, Yinghua
    Wang, Pin
    Li, Yongming
    INFORMATION SCIENCES, 2023, 637
  • [24] Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario
    Guo, Shengnan
    Chen, Xuekai
    Wang, Zhuang
    Yang, Zhongliang
    Zhou, Linna
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2023, 2024, 14511 : 304 - 318
  • [25] Mean shift-based clustering for misaligned functional data
    Welbaum, Andrew
    Qiao, Wanli
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 206
  • [26] Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine
    Wang, Ke-Fan
    An, Jing
    Wei, Zhen
    Cui, Can
    Ma, Xiang-Hua
    Ma, Chao
    Bao, Han-Qiu
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2022, 9
  • [27] RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification
    Arafa, Ahmed
    El-Fishawy, Nawal
    Badawy, Mohammed
    Radad, Marwa
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 5059 - 5074
  • [28] MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SAR IMAGE SEGMENTATION/CLASSIFICATION
    Beaulieu, Jean-Marie
    Touzi, Ridha
    2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2010, : 2519 - 2522
  • [29] An ensemble contrastive classification framework for imbalanced learning with sample-neighbors pair construction
    Gao, Xin
    Jia, Xin
    Liu, Jing
    Xue, Bing
    Huang, Zijian
    Fu, Shiyuan
    Zhang, Guangyao
    Li, Kangsheng
    KNOWLEDGE-BASED SYSTEMS, 2022, 249
  • [30] Darknet Traffic Analysis and Classification Using Numerical AGM and Mean Shift Clustering Algorithm
    Niranjana R.
    Kumar V.A.
    Sheen S.
    SN Computer Science, 2020, 1 (1)