Diverse training dataset generation based on a multi-objective optimization for semi-Supervised classification

被引:19
作者
Donyavi, Zahra [1 ]
Asadi, Shahrokh [1 ]
机构
[1] Univ Tehran, Coll Farabi, Fac Engn, Data Min Lab, Tehran, Iran
关键词
Self-labeled; Semi-supervised learning; Evolutionary multi-objective optimization; Data density function; NSGA-II; DATA SETS; SMOTE; SOFTWARE;
D O I
10.1016/j.patcog.2020.107543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The self-labeled technique is a type of semi-supervised classification that can be used when labeled data are lacking. Although existing self-labeled techniques show promise in many areas of classification and pattern recognition, they commonly incorrectly label data. The reasons for this problem are the shortage of labeled data and the inappropriate distribution of data in problem space. To deal with this problem, we propose in this paper a synthetic, labeled data generation method based on accuracy and density. Positions of generated data are improved through a multi-objective evolutionary algorithm with two objectives - accuracy and density. The density function generates data with an appropriate distribution and diversity in feature space, whereas the accuracy function eliminates outlier data. The advantage of the proposed method over existing ones is that it simultaneously considers accuracy and distribution of generated data in feature space. We have applied the new proposed method on four self-labeled techniques with different features, i.e., Democratic-co, Tri-training, Co-forest, and Co-bagging. The results show that the proposed method is superior to existing methods in terms of classification accuracy. Also, the superiority of the proposed method is confirmed over other data generation methods such as SMOTE, Borderline SMOTE, Safe-level SMOTE and SMOTE-RSB. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 65 条
  • [1] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [2] [Anonymous], 1995, P 33 ANN M ASS COMP
  • [3] [Anonymous], 2001, LEARNING LABELED UNL
  • [4] [Anonymous], 2005, PAC AS C KNOWL DISC
  • [5] [Anonymous], 2006, SEMISUPERVISED LEARN
  • [6] Anzai Y., 2012, Pattern Recognition and Machine Learning
  • [7] Complexity-based parallel rule induction for multiclass classification
    Asadi, Shahrokh
    Shahrabi, Jamal
    [J]. INFORMATION SCIENCES, 2017, 380 : 53 - 73
  • [8] Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction
    Asadi, Shahrokh
    Hadavandi, Esmaeil
    Mehmanpazir, Farhad
    Nakhostin, Mohammad Masoud
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 35 : 245 - 258
  • [9] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [10] Bennett KP, 1999, ADV NEUR IN, V11, P368