An oversampling method for multi-class imbalanced data based on composite weights

被引:9
|
作者
Deng, Mingyang [1 ,2 ]
Guo, Yingshi [1 ]
Wang, Chang [1 ]
Wu, Fuwei [1 ]
机构
[1] Changan Univ, Sch Automobile, Xian, Peoples R China
[2] Changchun Univ Technol, Coll Automobile Engn, Coll Humanities & Informat, Changchun, Peoples R China
来源
PLOS ONE | 2021年 / 16卷 / 11期
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
ALGORITHM; CLASSIFICATION; SMOTE;
D O I
10.1371/journal.pone.0259227
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
    Mondragon, Julio Cesar Munguia
    Lara, Erendira Rendon
    Eleuterio, Roberto Alejo
    Gutirrez, Everardo Efren Granda
    Lopez, Federico Del Razo
    MATHEMATICS, 2023, 11 (18)
  • [42] A membership-based resampling and cleaning algorithm for multi-class imbalanced overlapping data
    Ma, Tingting
    Lu, Shuxia
    Jiang, Chen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240
  • [43] Clustering-Based Oversampling Algorithm for Multi-class Imbalance Learning
    Zhao, Haixia
    Wu, Jian
    JOURNAL OF CLASSIFICATION, 2025, 42 (01) : 205 - 220
  • [44] SCALA: Scaling algorithm for multi-class imbalanced classification A novel algorithm specifically designed for multi-class multiple minority imbalanced data problems.
    Barzinji, Ala O.
    Ma, Jixin
    Ma, Chaoying
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2023, 2023, : 68 - 73
  • [45] Boosting methods for multi-class imbalanced data classification: an experimental review
    Jafar Tanha
    Yousef Abdi
    Negin Samadi
    Nazila Razzaghi
    Mohammad Asadpour
    Journal of Big Data, 7
  • [46] Boosting methods for multi-class imbalanced data classification: an experimental review
    Tanha, Jafar
    Abdi, Yousef
    Samadi, Negin
    Razzaghi, Nazila
    Asadpour, Mohammad
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [47] Improved multi-class classification approach for imbalanced big data on spark
    Tinku Singh
    Riya Khanna
    Manish Satakshi
    The Journal of Supercomputing, 2023, 79 : 6583 - 6611
  • [48] Concept Drift Detection from Multi-Class Imbalanced Data Streams
    Korycki, Lukasz
    Krawczyk, Bartosz
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1068 - 1079
  • [49] Improved multi-class classification approach for imbalanced big data on spark
    Singh, Tinku
    Khanna, Riya
    Satakshi
    Kumar, Manish
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (06): : 6583 - 6611
  • [50] OAHO: an effective algorithm for multi-class learning from imbalanced data
    Murphey, Yi L.
    Wang, Haoxing
    Ou, Guobin
    Feldkamp, Lee A.
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 406 - +