Research on expansion and classification of imbalanced data based on SMOTE algorithm

被引:0
|
作者
Shujuan Wang
Yuntao Dai
Jihong Shen
Jingxue Xuan
机构
[1] Harbin Engineering University,College of Mathematical Sciences
[2] Qiqihar University,College of Science
来源
Scientific Reports | / 11卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
With the development of artificial intelligence, big data classification technology provides the advantageous help for the medicine auxiliary diagnosis research. While due to the different conditions in the different sample collection, the medical big data is often imbalanced. The class-imbalance problem has been reported as a serious obstacle to the classification performance of many standard learning algorithms. SMOTE algorithm could be used to generate sample points randomly to improve imbalance rate, but its application is affected by the marginalization generation and blindness of parameter selection. Focusing on this problem, an improved SMOTE algorithm based on Normal distribution is proposed in this paper, so that the new sample points are distributed closer to the center of the minority sample with a higher probability to avoid the marginalization of the expanded data. Experiments show that the classification effect is better when use proposed algorithm to expand the imbalanced dataset of Pima, WDBC, WPBC, Ionosphere and Breast-cancer-wisconsin than the original SMOTE algorithm. In addition, the parameter selection of the proposed algorithm is analyzed and it is found that the classification effect is the best when the distribution characteristics of the original data was maintained best by selecting appropriate parameters in our designed experiments.
引用
收藏
相关论文
共 50 条
  • [1] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [2] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [3] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [4] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [5] Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm
    Jeatrakul, Piyasak
    Wong, Kok Wai
    Fung, Chun Che
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 152 - 159
  • [6] Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding
    Wang, Juanjuan
    Xu, Mantao
    Wang, Hui
    Zhang, Jiwu
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1815 - +
  • [7] ACTIVE SMOTE for Imbalanced Medical Data Classification
    Sena, Raul
    Ben Hamida, Sana
    ADVANCES IN INFORMATION SYSTEMS, ARTIFICIAL INTELLIGENCE AND KNOWLEDGE MANAGEMENT, ICIKS 2023, 2024, 486 : 81 - 97
  • [8] RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification
    Arafa, Ahmed
    El-Fishawy, Nawal
    Badawy, Mohammed
    Radad, Marwa
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 5059 - 5074
  • [9] An Improved SMOTE Imbalanced Data Classification Method Based on Support Degree
    Li, Kewen
    Zhang, Wenrong
    Lu, Qinghua
    Fang, Xianghua
    2014 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI 2014), 2014, : 34 - 38
  • [10] A novel overlapping minimization SMOTE algorithm for imbalanced classification
    He, Yulin
    Lu, Xuan
    Fournier-Viger, Philippe
    Huang, Joshua Zhexue
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (09) : 1266 - 1281