Semi-supervised classification trees

被引:0
作者
Jurica Levatić
Michelangelo Ceci
Dragi Kocev
Sašo Džeroski
机构
[1] Jožef Stefan Institute,Department of Knowledge Technologies
[2] Jožef Stefan International Postgraduate School,Department of Computer Science
[3] University of Bari Aldo Moro,undefined
来源
Journal of Intelligent Information Systems | 2017年 / 49卷
关键词
Semi-supervised learning; Binary classification; Multi-class classification; Decision trees; Random forests;
D O I
暂无
中图分类号
学科分类号
摘要
In many real-life problems, obtaining labelled data can be a very expensive and laborious task, while unlabeled data can be abundant. The availability of labeled data can seriously limit the performance of supervised learning methods. Here, we propose a semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models. Moreover, we further improve their predictive performance by using them as base predictive models in random forests. We performed an extensive empirical evaluation on 12 binary and 12 multi-class classification datasets. The results showed that the proposed methods improve the predictive performance of their supervised counterparts. Moreover, we show that, in cases with limited availability of labeled data, the semi-supervised decision trees often yield models that are smaller and easier to interpret than supervised decision trees.
引用
收藏
页码:461 / 486
页数:25
相关论文
共 70 条
[1]  
Bauer E(1999)An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants Machine Learning 36 105-139
[2]  
Kohavi R(1996)Bagging predictors Machine Learning 24 123-140
[3]  
Breiman L(2001)Random forests Machine Learning 45 5-32
[4]  
Breiman L(2008)Optimization techniques for semi-supervised support vector machines Journal of Machine Learning Research 9 203-233
[5]  
Chapelle O(2005)Learning from labeled and unlabeled data: An empirical study across techniques and domains Journal of Artificial Intelligence Research 23 331-366
[6]  
Sindhwani V(2000)Classification and regression trees: a powerful yet simple technique for ecological data analysis Ecology 81 3178-3192
[7]  
Keerthi SS(1999)Body mass index, diabetes, and c-reactive protein among us adults Diabetes care 22 1971-1977
[8]  
Chawla N(2015)Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome PloS one 10 e0129,126-833
[9]  
Karakoulas G(2013)Tree ensembles for predicting structured outputs Pattern Recognition 46 817-2638
[10]  
De’ath G(2012)Robust and scalable graph-based semisupervised learning Proceedings of the IEEE 100 2624-483