A semi-supervised approach to growing classification trees

被引：0

作者：

Santhiappan, Sudarsun ^{[1
]}

Ravindran, Balaraman ^{[2
]}

机构：

[1] IIT Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India

[2] IIT Madras, Dept Comp Sci & Engn, Robert Bosch Ctr Data Sci & AI RBC DSAI, Chennai, Tamil Nadu, India

来源：

CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD) | 2021年

关键词：

Classification trees; Maximum mean discrepancy; Class ratio estimation; Semi-supervised methods;

D O I：

10.1145/3430984.3431009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A classification tree is grown by repeated partitioning of the dataset based on a predefined split criterion. The node split in the growth process depends only on the class ratio of the data chunk that gets split in every internal node of the tree. In a classification tree learning task, when the class ratio of the unlabeled part of the dataset is available, it becomes feasible to use the unlabeled data alongside the labeled data to train the tree in a semi-supervised style. Our motivation is to facilitate the usage of the abundantly available unlabeled data for building classification trees, as it is laborious and expensive to acquire labels. In this paper, we propose a semi-supervised approach to growing classification trees, where we adapted the Maximum Mean Discrepancy (MMD) method for estimating the class ratio at every node split. In our experimentation using several binary and multiclass classification datasets, we observed that our semi-supervised approach to growing a classification tree is statistically better than traditional decision tree algorithms in 31 of 40 datasets.

引用

页码：29 / 37

页数：9

共 28 条

[21] Adjusting the outputs of a classifier to new a priori probabilities:: A simple procedure
Saerens, M
Latinne, P
Decaestecker, C
[J]. NEURAL COMPUTATION, 2002, 14 (01) : 21 - 41
[22] EQUIVALENCE OF DISTANCE-BASED AND RKHS-BASED STATISTICS IN HYPOTHESIS TESTING
Sejdinovic, Dino
Sriperumbudur, Bharath
Gretton, Arthur
Fukumizu, Kenji
[J]. ANNALS OF STATISTICS, 2013, 41 (05) : 2263 - 2291
[23] Sriperumbudur BK, 2010, J MACH LEARN RES, V11, P1517
[24] Semi-supervised self-training for decision tree classifiers
Tanha, Jafar
van Someren, Maarten
Afsarmanesh, Hamideh
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (01) : 355 - 370
[25] Utgoff P. E., 1989, Machine Learning, V4, P161, DOI 10.1023/A:1022699900025
[26] Zhu X., 2009, SYNTHESIS LECT ARTIF, V3, P1, DOI [DOI 10.2200/S00196ED1V01Y200906AIM006, 10.1007/978-3-031-01548-9, 10.2200/S00196ED1V01Y200906AIM006]
[27] Zhu Xiaofeng, 2017, Med Image Comput Comput Assist Interv, V10435, P72, DOI 10.1007/978-3-319-66179-7_9
[28] Zurn Jane Brooks, 2008, SSR SEMISUPERVISED R, P1

← 1 2 3 →