Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme

被引：13

作者：

Fazakis, Nikos ^{[1
]}

Kanas, Vasileios G. ^{[1
]}

Aridas, Christos K. ^{[2
]}

Karlos, Stamatis ^{[2
]}

Kotsiantis, Sotiris ^{[2
]}

机构：

[1] Univ Patras, Dept Elect & Comp Engn, Wired Commun Lab, Achaia 26504, Greece

[2] Univ Patras, Dept Math, Educ Software Dev Lab, Achaia 26504, Greece

来源：

ENTROPY | 2019年 / 21卷 / 10期

关键词：

active learning; semi-supervised learning; self-training; classification; combination of learning methods; ROTATION FOREST; REGRESSION; SOFTWARE;

D O I：

10.3390/e21100988

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets.

引用

页数：28

共 67 条

[1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2] Ahsan MNI, 2016, 7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016
[3] A Neural Network Ensemble Classifier for Effective Intrusion Detection Using Fuzzy Clustering and Radial Basis Function Networks
Amini, Mohammad
Rezaeenour, Jalal
Hadavandi, Esmaeil
[J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2016, 25 (02)
[4] A Sampling Theory Perspective of Graph-Based Semi-Supervised Learning
Anis, Aamir
El Gamal, Aly
Avestimehr, A. Salman
Ortega, Antonio
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (04) : 2322 - 2342
[5] [Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
[6] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[7] A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs
Bologna, Guido
Hayashi, Yoichi
[J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2018, 2018
[8] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[9] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[10] Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
Celli, Fabrizio
Cumbo, Fabio
Weitschek, Emanuel
[J]. BIG DATA RESEARCH, 2018, 13 : 21 - 28

← 1 2 3 4 5 6 7 →