Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest

被引：11

作者：

Liu, Zhenyu ^{[1
,2
]}

Wen, Tao ^{[1
,2
]}

Sun, Wei ^{[2
]}

Zhang, Qilong ^{[1
]}

机构：

[1] Northeastern Univ, Coll Comp Sci & Engn, Shenyang 110819, Peoples R China

[2] Dalian Neusoft Univ Informat, Dept Comp Sci & Technol, Dalian 116023, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

关键词：

Semi-supervised learning; self-training; decision tree; random forest; node splits; CLASSIFICATION;

D O I：

10.1109/ACCESS.2020.3008951

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence are very important. The classical decision tree as the base learner cannot be effective in a self-training algorithm, because it cannot correctly estimate its own predictions. In this paper, we propose a novel method of node split of the decision trees, which uses weighted features to cluster instances. This method is able to combine multiple numerical and categorical features to split nodes. The decision tree and random forest constructed by this method are called FWCDT and FWCRF respectively. FWCDT and FWCRF have the better classification ability than the classical decision trees and forests based on univariate split when the training instances are fewer, therefore, they are more suitable as the base classifiers in self-training. What's more, on the basis of the proposed node-split method, we also respectively explore the suitable prediction confidence measurements for FWCDT and FWCRF. Finally, the results of experiment implemented on the UCI datasets show that the self-training feature weighted clustering decision tree (ST-FWCDT) and random forest (ST-FWCRF) can effectively exploit unlabeled data, and the final obtained classifiers have better generalization ability.

引用

页码：128337 / 128348

页数：12

共 40 条

[1]

[Anonymous], 2008, 1530 TR U WISC MAD D

[2] Drug-Target Interaction Prediction in Drug Repositioning Based on Deep Semi-Supervised Learning [J].

Bahi, Meriem ;

Batouche, Mohamed .

COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS, 2018, 522 :302-313

[3]

Belkin M, 2006, J MACH LEARN RES, V7, P2399

[4]

Bennett KP, 1999, ADV NEUR IN, V11, P368

[5] CONVERGENCE THEOREM FOR THE FUZZY ISODATA CLUSTERING ALGORITHMS [J].

BEZDEK, JC .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (01) :1-8

[6]

Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962

[7]

Breiman L., 2001, Mach. Learn., V45, P5

[8]

Buntine W., 1992, Statistics and Computing, V2, P63, DOI 10.1007/BF01889584

[9]

Cozman F.G., 2003, P 20 INT C MACHINE L, P99

[10] A decision-theoretic generalization of on-line learning and an application to boosting [J].

Freund, Y ;

Schapire, RE .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139

← 1 2 3 4 →