Diverse training dataset generation based on a multi-objective optimization for semi-Supervised classification

被引:19
作者
Donyavi, Zahra [1 ]
Asadi, Shahrokh [1 ]
机构
[1] Univ Tehran, Coll Farabi, Fac Engn, Data Min Lab, Tehran, Iran
关键词
Self-labeled; Semi-supervised learning; Evolutionary multi-objective optimization; Data density function; NSGA-II; DATA SETS; SMOTE; SOFTWARE;
D O I
10.1016/j.patcog.2020.107543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The self-labeled technique is a type of semi-supervised classification that can be used when labeled data are lacking. Although existing self-labeled techniques show promise in many areas of classification and pattern recognition, they commonly incorrectly label data. The reasons for this problem are the shortage of labeled data and the inappropriate distribution of data in problem space. To deal with this problem, we propose in this paper a synthetic, labeled data generation method based on accuracy and density. Positions of generated data are improved through a multi-objective evolutionary algorithm with two objectives - accuracy and density. The density function generates data with an appropriate distribution and diversity in feature space, whereas the accuracy function eliminates outlier data. The advantage of the proposed method over existing ones is that it simultaneously considers accuracy and distribution of generated data in feature space. We have applied the new proposed method on four self-labeled techniques with different features, i.e., Democratic-co, Tri-training, Co-forest, and Co-bagging. The results show that the proposed method is superior to existing methods in terms of classification accuracy. Also, the superiority of the proposed method is confirmed over other data generation methods such as SMOTE, Borderline SMOTE, Safe-level SMOTE and SMOTE-RSB. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 65 条
  • [11] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [12] Bagging predictors
    Breiman, L
    [J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
  • [13] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
  • [14] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [15] MEAN SHIFT, MODE SEEKING, AND CLUSTERING
    CHENG, YZ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) : 790 - 799
  • [16] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
  • [17] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [18] Data I.N.N.S.C.o.B., 2016, P 2 INNS C BIG DAT O
  • [19] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) : 422 - 437
  • [20] Demsar J, 2006, J MACH LEARN RES, V7, P1