Semi-supervised Classification Based Mixed Sampling for Imbalanced Data

被引:10
作者
Zhao, Jianhua [1 ]
Liu, Ning [2 ]
机构
[1] Shangluo Univ, Coll Math & Comp Applicat, Shangluo 726000, Peoples R China
[2] Shangluo Univ, Coll Econ Management, Shangluo 726000, Peoples R China
关键词
semi-supervised learning; imbalanced data; over sampling; under sampling; ensemble learning; ALGORITHM; SMOTE;
D O I
10.1515/phys-2019-0103
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In practical application, there are a large amount of imbalanced data containing only a small number of labeled data. In order to improve the classification performance of this kind of problem, this paper proposes a semi-supervised learning algorithm based on mixed sampling for imbalanced data classification (S2MAID), which combines semi-supervised learning, over sampling, under sampling and ensemble learning. Firstly, a kind of under sampling algorithm UD-density is provided to select samples with high information content from majority class set for semi-supervised learning. Secondly, a safe supervised-learning method is used to mark unlabeled sample and expand the labeled sample. Thirdly, a kind of over sampling algorithm SMOTE-density is provided to make the imbalanced data set become balance set. Fourthly, an ensemble technology is used to generate a strong classifier. Finally, the experiment is carried out on imbalanced data with containing only a few labeled samples, and semi-supervised learning process is simulated. The proposed S2MAID is verified and the experimental result shows that the proposed S2MAID has a better classification performance.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 38 条
[1]   Transductive hyperspectral image classification: toward integrating spectral and relational features via an iterative ensemble system [J].
Appice, Annalisa ;
Guccione, Pietro ;
Malerba, Donato .
MACHINE LEARNING, 2016, 103 (03) :343-375
[2]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[3]   Semi-supervised learning on Riemannian manifolds [J].
Belkin, M ;
Niyogi, P .
MACHINE LEARNING, 2004, 56 (1-3) :209-239
[4]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[5]   Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data [J].
Castro, Cristiano L. ;
Braga, Antonio P. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) :888-899
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]  
Dewasurendra M., 2018, Applied Mathematics and Nonlinear Sciences, V1, P1
[8]  
[杜利敏 Du Limin], 2018, [计算机应用研究, Application Research of Computers], V35, P342
[9]   Semi-supervised learning using multiple clusterings with limited labeled data [J].
Forestier, Germain ;
Wemmert, Cedric .
INFORMATION SCIENCES, 2016, 361 :48-65
[10]   A neural network algorithm for semi-supervised node label learning from unbalanced data [J].
Frasca, Marco ;
Bertoni, Alberto ;
Re, Matteo ;
Valentini, Giorgio .
NEURAL NETWORKS, 2013, 43 :84-98