Diverse training dataset generation based on a multi-objective optimization for semi-Supervised classification

被引：19

作者：

Donyavi, Zahra ^{[1
]}

Asadi, Shahrokh ^{[1
]}

机构：

[1] Univ Tehran, Coll Farabi, Fac Engn, Data Min Lab, Tehran, Iran

来源：

PATTERN RECOGNITION | 2020年 / 108卷 / 108期

关键词：

Self-labeled; Semi-supervised learning; Evolutionary multi-objective optimization; Data density function; NSGA-II; DATA SETS; SMOTE; SOFTWARE;

D O I：

10.1016/j.patcog.2020.107543

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The self-labeled technique is a type of semi-supervised classification that can be used when labeled data are lacking. Although existing self-labeled techniques show promise in many areas of classification and pattern recognition, they commonly incorrectly label data. The reasons for this problem are the shortage of labeled data and the inappropriate distribution of data in problem space. To deal with this problem, we propose in this paper a synthetic, labeled data generation method based on accuracy and density. Positions of generated data are improved through a multi-objective evolutionary algorithm with two objectives - accuracy and density. The density function generates data with an appropriate distribution and diversity in feature space, whereas the accuracy function eliminates outlier data. The advantage of the proposed method over existing ones is that it simultaneously considers accuracy and distribution of generated data in feature space. We have applied the new proposed method on four self-labeled techniques with different features, i.e., Democratic-co, Tri-training, Co-forest, and Co-bagging. The results show that the proposed method is superior to existing methods in terms of classification accuracy. Also, the superiority of the proposed method is confirmed over other data generation methods such as SMOTE, Borderline SMOTE, Safe-level SMOTE and SMOTE-RSB. (c) 2020 Elsevier Ltd. All rights reserved.

引用

页数：16

共 65 条

[11] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[12] Bagging predictors
Breiman, L
[J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
[13] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[14] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[15] MEAN SHIFT, MODE SEEKING, AND CLUSTERING
CHENG, YZ
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) : 790 - 799
[16] SUPPORT-VECTOR NETWORKS
CORTES, C
VAPNIK, V
[J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
[17] NEAREST NEIGHBOR PATTERN CLASSIFICATION
COVER, TM
HART, PE
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
[18] Data I.N.N.S.C.o.B., 2016, P 2 INNS C BIG DAT O
[19] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules
del Rio, Sara
Lopez, Victoria
Manuel Benitez, Jose
Herrera, Francisco
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) : 422 - 437
[20] Demsar J, 2006, J MACH LEARN RES, V7, P1

← 1 2 3 4 5 6 7 →