An oversampling method for multi-class imbalanced data based on composite weights

被引:11
作者
Deng, Mingyang [1 ,2 ]
Guo, Yingshi [1 ]
Wang, Chang [1 ]
Wu, Fuwei [1 ]
机构
[1] Changan Univ, Sch Automobile, Xian, Peoples R China
[2] Changchun Univ Technol, Coll Automobile Engn, Coll Humanities & Informat, Changchun, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
ALGORITHM; CLASSIFICATION; SMOTE;
D O I
10.1371/journal.pone.0259227
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
引用
收藏
页数:15
相关论文
共 41 条
[1]   AN IMPROVED ALGORITHM FOR NEURAL-NETWORK CLASSIFICATION OF IMBALANCED TRAINING SETS [J].
ANAND, R ;
MEHROTRA, KG ;
MOHAN, CK ;
RANKA, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (06) :962-969
[2]  
[Anonymous], 2020, Computer Engineering and Applications, V56, P220
[3]  
Batista G.E.A.P.A., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Desai P., 2021, SN COMPUT SCI, V2, P170, DOI DOI 10.1007/S42979-021-00529-4
[6]   Multi class SVM algorithm with active learning for network traffic classification [J].
Dong, Shi .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176
[7]   CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification [J].
Elyan, Eyad ;
Moreno-Garcia, Carlos Francisco ;
Jayne, Chrisina .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07) :2839-2851
[8]   A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis [J].
Feng, Dan ;
Chen, Hainan .
ADVANCED ENGINEERING INFORMATICS, 2021, 47
[9]   Farthest SMOTE: A Modified SMOTE Approach [J].
Gosain, Anjana ;
Sardana, Saanchi .
COMPUTATIONAL INTELLIGENCE IN DATA MINING, 2019, 711 :309-320
[10]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887