Semi-supervised Classification Based Mixed Sampling for Imbalanced Data

被引:10
作者
Zhao, Jianhua [1 ]
Liu, Ning [2 ]
机构
[1] Shangluo Univ, Coll Math & Comp Applicat, Shangluo 726000, Peoples R China
[2] Shangluo Univ, Coll Econ Management, Shangluo 726000, Peoples R China
来源
OPEN PHYSICS | 2019年 / 17卷 / 01期
关键词
semi-supervised learning; imbalanced data; over sampling; under sampling; ensemble learning; ALGORITHM; SMOTE;
D O I
10.1515/phys-2019-0103
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In practical application, there are a large amount of imbalanced data containing only a small number of labeled data. In order to improve the classification performance of this kind of problem, this paper proposes a semi-supervised learning algorithm based on mixed sampling for imbalanced data classification (S2MAID), which combines semi-supervised learning, over sampling, under sampling and ensemble learning. Firstly, a kind of under sampling algorithm UD-density is provided to select samples with high information content from majority class set for semi-supervised learning. Secondly, a safe supervised-learning method is used to mark unlabeled sample and expand the labeled sample. Thirdly, a kind of over sampling algorithm SMOTE-density is provided to make the imbalanced data set become balance set. Fourthly, an ensemble technology is used to generate a strong classifier. Finally, the experiment is carried out on imbalanced data with containing only a few labeled samples, and semi-supervised learning process is simulated. The proposed S2MAID is verified and the experimental result shows that the proposed S2MAID has a better classification performance.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 38 条
  • [1] Transductive hyperspectral image classification: toward integrating spectral and relational features via an iterative ensemble system
    Appice, Annalisa
    Guccione, Pietro
    Malerba, Donato
    [J]. MACHINE LEARNING, 2016, 103 (03) : 343 - 375
  • [2] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [3] Semi-supervised learning on Riemannian manifolds
    Belkin, M
    Niyogi, P
    [J]. MACHINE LEARNING, 2004, 56 (1-3) : 209 - 239
  • [4] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [5] Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data
    Castro, Cristiano L.
    Braga, Antonio P.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) : 888 - 899
  • [6] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [7] Dewasurendra M., 2018, Applied Mathematics and Nonlinear Sciences, V1, P1
  • [8] [杜利敏 Du Limin], 2018, [计算机应用研究, Application Research of Computers], V35, P342
  • [9] Semi-supervised learning using multiple clusterings with limited labeled data
    Forestier, Germain
    Wemmert, Cedric
    [J]. INFORMATION SCIENCES, 2016, 361 : 48 - 65
  • [10] A neural network algorithm for semi-supervised node label learning from unbalanced data
    Frasca, Marco
    Bertoni, Alberto
    Re, Matteo
    Valentini, Giorgio
    [J]. NEURAL NETWORKS, 2013, 43 : 84 - 98