An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE

被引:12
|
作者
Yang, Wensheng [1 ]
Pan, Chengsheng [1 ]
Zhang, Yanyan [1 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Elect & Informat Engn, Intelligent Network & Informat Syst, Nanjing 210044, Peoples R China
关键词
DIAGNOSIS;
D O I
10.1038/s41598-022-21046-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the rapid expansion of data, the problem of data imbalance has become increasingly prominent in the fields of medical treatment, finance, network, etc. And it is typically solved using the oversampling method. However, most existing oversampling methods randomly sample or sample only for a particular area, which affects the classification results. To solve the above limitations, this study proposes an imbalanced data oversampling method, SD-KMSMOTE, based on the spatial distribution of minority samples. A filter noise pre-treatment is added, the category information of the near-neighbouring samples is considered, and the existing minority class sample noise is removed. These conditions lead to the design of a new sample synthesis method, and the rules for calculating the weight values are constructed on this basis. The spatial distribution of minority class samples is considered comprehensively; they are clustered, and the sub-clusters that contain useful information are assigned larger weight values and more synthetic sample numbers. The experimental results show that the experimental results outperform existing methods in terms of precision, recall, F1 score, G-mean, and area under the curve values when the proposed method is used to expand the imbalanced dataset in the field of medicine and other fields.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE
    Wensheng Yang
    Chengsheng Pan
    Yanyan Zhang
    Scientific Reports, 12
  • [2] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [3] A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets
    Kunakorntum, Intouch
    Hinthong, Woranich
    Phunchongharn, Phond
    IEEE ACCESS, 2020, 8 : 114692 - 114704
  • [4] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [5] A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data
    Xu, Shoukun
    Li, Zhibang
    Yuan, Baohua
    Yang, Gaochao
    Wang, Xueyuan
    Li, Ning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 367 - 378
  • [6] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
    Tao, Xinmin
    Guo, Xinyue
    Zheng, Yujia
    Zhang, Xiaohan
    Chen, Zhiyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [7] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Liu, Jie
    SOFT COMPUTING, 2022, 26 (03) : 1141 - 1163
  • [8] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Jie Liu
    Soft Computing, 2022, 26 : 1141 - 1163
  • [9] Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data
    Pradipta, Gede Angga
    Wardoyo, Retantyo
    Musdholifah, Aina
    Sanjaya, I. Nyoman Hariyasa
    IEEE ACCESS, 2021, 9 : 74763 - 74777
  • [10] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122