Synthetic minority oversampling technique based on natural neighborhood graph with subgraph cores for class-imbalanced classification

被引:0
作者
Zhao, Ming [1 ]
机构
[1] Chongqing Ind Polytech Coll, Mech Engn Inst, Yubei, Peoples R China
关键词
Class-imbalanced classification; Oversampling technique; Natural neighborhood graph; Noise filter; Interpolation; SMOTE; MAJORITY; NOISY;
D O I
10.1007/s11227-024-06655-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The synthetic minority oversampling technique (SMOTE) has been praised by researchers in class-imbalanced classification. Although SMOTE eliminates imbalances between classes, overgeneralization and imbalances within minority classes present great challenges. Filtering-based or change-direction oversampling techniques of the SMOTE family have been developed to overcome these challenges; however, they still experience the following issues: a) many can avoid overgeneralization by removing suspicious noise or creating synthetic minority class samples in safe regions but fail to eliminate imbalances within minority classes; b) some change-direction oversampling techniques can eliminate imbalances within minority classes but cannot remove suspicious noise and have relatively high time complexity; and c) most heavily rely on more than two parameters. To overcome overgeneralization, imbalances within minority classes and the above drawbacks, this work presents an effective natural neighborhood graph-based synthetic minority oversampling technique (NaNG-SMOTE). First, a natural neighborhood graph (NaNG) is constructed on class-imbalanced data. Second, heterogeneous and homogeneous edges are defined to identify and remove suspicious noise. Third, NaNG is divided into separated subgraphs with subgraph cores, and then these subgraphs with subgraph cores of minority classes are preserved. Fourth, the sampling weight of each preserved subgraph is calculated based on the density and the number of minority class vertices. Fifth, synthetic minority class samples are created based on sampling weights and interpolation between subgraph cores and each vertex. Intensive experiments have proven that the NaNG-SMOTE outperforms 8 sophisticated oversampling techniques in improving 4 representative classifiers on synthetic or benchmark datasets from industrial applications with various imbalance ratios.
引用
收藏
页数:35
相关论文
共 46 条
  • [1] Performance Analysis Among Predictive Models of Lightning Occurrence Using Artificial Neural Networks and SMOTE
    Alves, Elton Rafael
    Raiol Leal, AdOnis Ferreira
    Lopes, Marcio Nirlando G.
    Fonseca, Alber da Silva
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2021, 19 (05) : 755 - 762
  • [2] An Investigation of SMOTE Based Methods for Imbalanced Datasets With Data Complexity Analysis
    Azhar, Nur Athirah
    Pozi, Muhammad Syafiq Mohd
    Din, Aniza Mohamed
    Jatowt, Adam
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6651 - 6672
  • [3] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [4] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [5] DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    Lursinsap, Chidchanok
    [J]. APPLIED INTELLIGENCE, 2012, 36 (03) : 664 - 684
  • [6] Multi-objective evolution of oblique decision trees for imbalanced data binary classification
    Chabbouh, Marwa
    Bechikh, Slim
    Hung, Chih-Cheng
    Ben Said, Lamjed
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2019, 49 : 1 - 22
  • [7] Multiclass Oblique Random Forests With Dual-Incremental Learning Capacity
    Chai, Zheng
    Zhao, Chunhui
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5192 - 5203
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise
    Chen, Baiyun
    Xia, Shuyin
    Chen, Zizhong
    Wang, Binggui
    Wang, Guoyin
    [J]. INFORMATION SCIENCES, 2021, 553 : 397 - 428
  • [10] PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification
    Chen, Kanghao
    Lei, Weixian
    Zhao, Shen
    Zheng, Wei-Shi
    Wang, Ruixuan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (04) : 2026 - 2036