Combination of Active Learning and Semi-Supervised Learning under a Self-Training Scheme

被引:13
作者
Fazakis, Nikos [1 ]
Kanas, Vasileios G. [1 ]
Aridas, Christos K. [2 ]
Karlos, Stamatis [2 ]
Kotsiantis, Sotiris [2 ]
机构
[1] Univ Patras, Dept Elect & Comp Engn, Wired Commun Lab, Achaia 26504, Greece
[2] Univ Patras, Dept Math, Educ Software Dev Lab, Achaia 26504, Greece
关键词
active learning; semi-supervised learning; self-training; classification; combination of learning methods; ROTATION FOREST; REGRESSION; SOFTWARE;
D O I
10.3390/e21100988
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets.
引用
收藏
页数:28
相关论文
共 67 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] Ahsan MNI, 2016, 7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016
  • [3] A Neural Network Ensemble Classifier for Effective Intrusion Detection Using Fuzzy Clustering and Radial Basis Function Networks
    Amini, Mohammad
    Rezaeenour, Jalal
    Hadavandi, Esmaeil
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2016, 25 (02)
  • [4] A Sampling Theory Perspective of Graph-Based Semi-Supervised Learning
    Anis, Aamir
    El Gamal, Aly
    Avestimehr, A. Salman
    Ortega, Antonio
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (04) : 2322 - 2342
  • [5] [Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
  • [6] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [7] A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs
    Bologna, Guido
    Hayashi, Yoichi
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2018, 2018
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
    Celli, Fabrizio
    Cumbo, Fabio
    Weitschek, Emanuel
    [J]. BIG DATA RESEARCH, 2018, 13 : 21 - 28