HGDO: An oversampling technique based on hypergraph recognition and Gaussian distribution

被引:1
|
作者
Jia, Liyan [1 ]
Wang, Zhiping [1 ]
Sun, Pengfei [1 ]
Wang, Peiwen [2 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Peoples R China
[2] Northeastern Univ NEU, Sch Business Adm, Shenyang 110169, Peoples R China
关键词
Data generation; Class imbalance learning; Oversampling; Hypergraph; RE-SAMPLING METHOD; SMOTE; CLASSIFICATION; IMAGES;
D O I
10.1016/j.ins.2024.120891
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The synthetic minority oversampling technique (SMOTE) is the most prevalent solution in class imbalance learning. While SMOTE and its variant methods handle imbalanced data well in most cases, they fail to take full advantage of the structural information in the overall data, which leads to the propagation of noise. Some existing SMOTE variants remove noisy samples by adding an undersampling process. However, due to the complexity of the data distribution, it is difficult to accurately identify real noise samples, leading to lower modeling quality. To this end, we propose an oversampling technique based on hypergraph identification and Gaussian distribution (HGDO). First, neighborhood reconstruction is performed for each sample depending on the sparse representation to build a hypergraph model, and outlier and noisy samples are filtered according to this model. Then, the weight of each retained minority class sample is determined through the distribution relationship of hyperedges and vertices. Finally, new samples are generated based on the Laplacian matrix and Gaussian distribution to balance the dataset. The comprehensive experimental analysis demonstrates the superiority of HGDO over some popular SMOTE variants.
引用
收藏
页数:43
相关论文
共 50 条
  • [1] An oversampling technique based on noise detection and geometry
    Sun, Pengfei
    Wang, Zhiping
    Jia, Liyan
    Wang, Lin
    APPLIED SOFT COMPUTING, 2025, 170
  • [2] Noise-Robust Gaussian Distribution Based Imbalanced Oversampling
    Shao, Xuetao
    Yan, Yuanting
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT II, 2024, 14488 : 221 - 234
  • [3] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [4] Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification
    Xie, Jie
    Zhu, Mingying
    Hu, Kai
    Zhang, Jinglan
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 735 - 749
  • [5] Oversampling Method Based on Gaussian Distribution and K-Means Clustering
    Hassan, Masoud Muhammed
    Eesa, Adel Sabry
    Mohammed, Ahmed Jameel
    Arabo, Wahab Kh
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (01): : 451 - 469
  • [6] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [7] TDMO: Dynamic multi-dimensional oversampling for exploring data distribution based on extreme gradient boosting learning
    Jia, Liyan
    Wang, Zhiping
    Sun, Pengfei
    Xu, Zhaohui
    Yang, Sibo
    INFORMATION SCIENCES, 2023, 649
  • [8] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [9] Distance-based arranging oversampling technique for imbalanced data
    Dai, Qi
    Liu, Jian-wei
    Zhao, Jia-Liang
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02) : 1323 - 1342
  • [10] Distance-based arranging oversampling technique for imbalanced data
    Qi Dai
    Jian-wei Liu
    Jia-Liang Zhao
    Neural Computing and Applications, 2023, 35 : 1323 - 1342