Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

被引:399
作者
Triguero, Isaac [1 ]
Garcia, Salvador [2 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Res Ctr Informat & Commun Technol CITIC UGR, E-18071 Granada, Spain
[2] Univ Jaen, Dept Comp Sci, Jaen 23071, Spain
关键词
Learning from unlabeled data; Semi-supervised learning; Self-training; Co-training; Multi-view learning; Classification; STATISTICAL COMPARISONS; UNLABELED DATA; FRAMEWORK; ALGORITHMS; CLASSIFICATION; CLASSIFIERS; KEEL; TOOL;
D O I
10.1007/s10115-013-0706-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised classification methods are suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. This problem has been addressed by several approaches with different assumptions about the characteristics of the input data. Among them, self-labeled techniques follow an iterative procedure, aiming to obtain an enlarged labeled data set, in which they accept that their own predictions tend to be correct. In this paper, we provide a survey of self-labeled methods for semi-supervised classification. From a theoretical point of view, we propose a taxonomy based on the main characteristics presented in them. Empirically, we conduct an exhaustive study that involves a large number of data sets, with different ratios of labeled data, aiming to measure their performance in terms of transductive and inductive classification capabilities. The results are contrasted with nonparametric statistical tests. Note is then taken of which self-labeled models are the best-performing ones. Moreover, a semi-supervised learning module has been developed for the Knowledge Extraction based on Evolutionary Learning software, integrating analyzed methods and data sets.
引用
收藏
页码:245 / 284
页数:40
相关论文
共 87 条
[1]   Genetic algorithm-based training for semi-supervised SVM [J].
Adankon, Mathias M. ;
Cheriet, Mohamed .
NEURAL COMPUTING & APPLICATIONS, 2010, 19 (08) :1197-1206
[2]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[3]   KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[4]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[5]  
Alpaydin E., 2010, Introduction to Machine Learning, V2
[6]   Self-supervised ARTMAP [J].
Amis, Gregory P. ;
Carpenter, Gail A. .
NEURAL NETWORKS, 2010, 23 (02) :265-282
[7]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C
[8]  
[Anonymous], 2007, Uci machine learning repository
[9]  
Basu Sugato, 2003, ICML, P42
[10]   Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection [J].
Belhumeur, PN ;
Hespanha, JP ;
Kriegman, DJ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) :711-720