An oversampling method for multi-class imbalanced data based on composite weights

被引:9
|
作者
Deng, Mingyang [1 ,2 ]
Guo, Yingshi [1 ]
Wang, Chang [1 ]
Wu, Fuwei [1 ]
机构
[1] Changan Univ, Sch Automobile, Xian, Peoples R China
[2] Changchun Univ Technol, Coll Automobile Engn, Coll Humanities & Informat, Changchun, Peoples R China
来源
PLOS ONE | 2021年 / 16卷 / 11期
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
ALGORITHM; CLASSIFICATION; SMOTE;
D O I
10.1371/journal.pone.0259227
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    INFORMATION SCIENCES, 2019, 494 : 141 - 154
  • [22] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [23] Multi-class random forest model to classify wastewater treatment imbalanced data
    Distefano, Veronica
    Palma, Monica
    De Iaco, Sandra
    SOCIO-ECONOMIC PLANNING SCIENCES, 2024, 95
  • [24] Combined Cleaning and Resampling algorithm for multi-class imbalanced data with label noise
    Koziarski, Michal
    Wozniak, Michal
    Krawczyk, Bartosz
    KNOWLEDGE-BASED SYSTEMS, 2020, 204 (204)
  • [25] GMMSampling: a new model-based, data difficulty-driven resampling method for multi-class imbalanced data
    Naglik, Iwo
    Lango, Mateusz
    MACHINE LEARNING, 2024, 113 (08) : 5183 - 5202
  • [26] Efficient DANNLO classifier for multi-class imbalanced data on Hadoop
    Satyanarayana S.
    Tayar Y.
    Prasad R.S.R.
    International Journal of Information Technology, 2019, 11 (2) : 321 - 329
  • [27] New imbalanced bearing fault diagnosis method based on Sample-characteristic Oversampling TechniquE (SCOTE) and multi-class LS-SVM
    Wei, Jianan
    Huang, Haisong
    Yao, Liguo
    Hu, Yao
    Fan, Qingsong
    Huang, Dong
    APPLIED SOFT COMPUTING, 2021, 101
  • [28] Combating Mutuality with Difficulty Factors in Multi-class Imbalanced Data: A Similarity-based Hybrid Sampling
    Zheng, Zhong
    Yan, Yuanting
    Zhang, Yiwen
    Zhang, Yanping
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 387 - 396
  • [29] Microclustering-Based Multi-Class Classification on Imbalanced Multi-Relational Datasets
    Pant, Hemlata
    Srivastava, Reena
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2022, 17 (01)
  • [30] Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
    Mondragon, Julio Cesar Munguia
    Lara, Erendira Rendon
    Eleuterio, Roberto Alejo
    Gutirrez, Everardo Efren Granda
    Lopez, Federico Del Razo
    MATHEMATICS, 2023, 11 (18)