Safe sample screening based sampling method for imbalanced data

被引:0
作者
Shi H. [1 ]
Liu Y. [1 ]
Ji S. [1 ]
机构
[1] College of Information, Shanxi University of Finance and Economics, Taiyuan
来源
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence | 2019年 / 32卷 / 06期
基金
中国国家自然科学基金;
关键词
Imbalanced Data; Safe Sample Screening; Synthetic Minority Oversampling Technique(SMOTE); Undersampling; Imbalance Ratio;
D O I
10.16451/j.cnki.issn1003-6059.201906007
中图分类号
学科分类号
摘要
The loss of valuable information may be caused by undersampling, and the class overlapping between the majority class and the minority class may be aggravated by the synthetic minority oversampling technique(SMOTE). A sampling method, Screening_SMOTE, is proposed in this paper, combining safe sample screening based undersampling with SMOTE. Parts of non-informative instances and noise instances in the majority class are identified and discarded by the undersampling method using safe screening rules. Then, the minority class instances generated by SMOTE are added into the screened dataset. The loss of informative information is avoided and the noise instances in the majority class are discarded using safe sample screening based undersampling, relieving the class overlapping. The experimental results show that Screening_SMOTE is an effective method of rebalancing imbalanced datasets, especially for high dimensional imbalanced datasets. © 2019, Science Press. All right reserved.
引用
收藏
页码:545 / 556
页数:11
相关论文
共 36 条
[21]  
Napierala K., Stefanowski J., Types of Minority Class Examples and Their Influence on Learning Classifiers from Imbalanced Data, Journal of Intelligent Information Systems, 46, 3, pp. 563-597, (2016)
[22]  
Han H., Wang W.Y., Mao B.H., Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Proc of the International Conference on Intelligent Computing, pp. 878-887, (2005)
[23]  
Li K.W., Fang X.H., Zhai J.P., Et al., An Imbalanced Data Cla-ssification Method Driven by Boundary Samples-Boundary-Boost, Proc of the International Conference on Information Science and Control Engineering, pp. 194-199, (2016)
[24]  
Wang J., Wonka P., Ye J.P., Scaling SVM and Least Absolute Deviations via Exact Data Reduction
[25]  
Shibagaki A., Karasuyama M., Hatano K., Et al., Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling
[26]  
Zhang W.Z., Hong B., Liu W., Et al., Scaling Up Sparse Support Vector Machine by Simultaneous Feature and Sample Reduction
[27]  
Wu G., Chang E.Y., Class-Boundary Alignment for Imbalanced Dataset Learning
[28]  
Liao S.Z., Wang M., Zhao Z.H., Regularization Path Algorithm of SVM via Positive Definite Matrix, Journal of Computer Research and Development, 50, 11, pp. 2253-2261, (2013)
[29]  
Farquad M.A.H., Bose I., Preprocessing Unbalanced Data Using Support Vector Machine, Decision Support Systems, 53, 1, pp. 226-233, (2012)
[30]  
Alcala-Fdez J., Fernandez A., Luengo J., Et al., KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework, Journal of Multiple-Valued Logic and Soft Computing, 17, 2, pp. 255-287, (2011)