A semi-supervised approach to growing classification trees

被引：0

作者：

Santhiappan, Sudarsun ^{[1
]}

Ravindran, Balaraman ^{[2
]}

机构：

[1] IIT Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India

[2] IIT Madras, Dept Comp Sci & Engn, Robert Bosch Ctr Data Sci & AI RBC DSAI, Chennai, Tamil Nadu, India

来源：

CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD) | 2021年

关键词：

Classification trees; Maximum mean discrepancy; Class ratio estimation; Semi-supervised methods;

D O I：

10.1145/3430984.3431009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A classification tree is grown by repeated partitioning of the dataset based on a predefined split criterion. The node split in the growth process depends only on the class ratio of the data chunk that gets split in every internal node of the tree. In a classification tree learning task, when the class ratio of the unlabeled part of the dataset is available, it becomes feasible to use the unlabeled data alongside the labeled data to train the tree in a semi-supervised style. Our motivation is to facilitate the usage of the abundantly available unlabeled data for building classification trees, as it is laborious and expensive to acquire labels. In this paper, we propose a semi-supervised approach to growing classification trees, where we adapted the Maximum Mean Discrepancy (MMD) method for estimating the class ratio at every node split. In our experimentation using several binary and multiclass classification datasets, we observed that our semi-supervised approach to growing a classification tree is statistically better than traditional decision tree algorithms in 31 of 40 datasets.

引用

页码：29 / 37

页数：9

共 28 条

[1] [Anonymous], 2018, InternationalConferenceonArtificialIntelligenceandStatistics, AISTATS2018, V84, P77
[2] Berlinet Alain, 2004, Reproducing Kernel Hilbert Spaces in Probability and Statistics, P1, DOI [10.1007/978-1-4419-9096-9_1, DOI 10.1007/978-1-4419-9096-9_1]
[3] Blockeel Hendrik., 2000, Proceedings of the 15th International Conference, P55
[4] Semi-supervised learning of class balance under class-prior change by distribution matching
du Plessis, Marthinus Christoffel
Sugiyama, Masashi
[J]. NEURAL NETWORKS, 2014, 50 : 110 - 119
[5] Dua D, 2017, UCI MACHINE LEARNING, DOI DOI 10.1016/J.DSS.2009.05.016
[6] Explaining Explanations: An Overview of Interpretability of Machine Learning
Gilpin, Leilani H.
Bau, David
Yuan, Ben Z.
Bajwa, Ayesha
Specter, Michael
Kagal, Lalana
[J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 80 - 89
[7] Gretton A., 2008, P ADV NEUR INF PROC, V1, P1
[8] Heifeng Li., 2019, SMILE STAT MACHINE I
[9] Kernel methods in machine learning
Hofmann, Thomas
Schoelkopf, Bernhard
Smola, Alexander J.
[J]. ANNALS OF STATISTICS, 2008, 36 (03) : 1171 - 1220
[10] Iyer A, 2014, PR MACH LEARN RES, V32

← 1 2 3 →