共 46 条
Synthetic minority oversampling technique based on natural neighborhood graph with subgraph cores for class-imbalanced classification
被引:0
作者:
Zhao, Ming
[1
]
机构:
[1] Chongqing Ind Polytech Coll, Mech Engn Inst, Yubei, Peoples R China
关键词:
Class-imbalanced classification;
Oversampling technique;
Natural neighborhood graph;
Noise filter;
Interpolation;
SMOTE;
MAJORITY;
NOISY;
D O I:
10.1007/s11227-024-06655-z
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
The synthetic minority oversampling technique (SMOTE) has been praised by researchers in class-imbalanced classification. Although SMOTE eliminates imbalances between classes, overgeneralization and imbalances within minority classes present great challenges. Filtering-based or change-direction oversampling techniques of the SMOTE family have been developed to overcome these challenges; however, they still experience the following issues: a) many can avoid overgeneralization by removing suspicious noise or creating synthetic minority class samples in safe regions but fail to eliminate imbalances within minority classes; b) some change-direction oversampling techniques can eliminate imbalances within minority classes but cannot remove suspicious noise and have relatively high time complexity; and c) most heavily rely on more than two parameters. To overcome overgeneralization, imbalances within minority classes and the above drawbacks, this work presents an effective natural neighborhood graph-based synthetic minority oversampling technique (NaNG-SMOTE). First, a natural neighborhood graph (NaNG) is constructed on class-imbalanced data. Second, heterogeneous and homogeneous edges are defined to identify and remove suspicious noise. Third, NaNG is divided into separated subgraphs with subgraph cores, and then these subgraphs with subgraph cores of minority classes are preserved. Fourth, the sampling weight of each preserved subgraph is calculated based on the density and the number of minority class vertices. Fifth, synthetic minority class samples are created based on sampling weights and interpolation between subgraph cores and each vertex. Intensive experiments have proven that the NaNG-SMOTE outperforms 8 sophisticated oversampling techniques in improving 4 representative classifiers on synthetic or benchmark datasets from industrial applications with various imbalance ratios.
引用
收藏
页数:35
相关论文