A semi-supervised approach to growing classification trees

被引:0
作者
Santhiappan, Sudarsun [1 ]
Ravindran, Balaraman [2 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] IIT Madras, Dept Comp Sci & Engn, Robert Bosch Ctr Data Sci & AI RBC DSAI, Chennai, Tamil Nadu, India
来源
CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD) | 2021年
关键词
Classification trees; Maximum mean discrepancy; Class ratio estimation; Semi-supervised methods;
D O I
10.1145/3430984.3431009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A classification tree is grown by repeated partitioning of the dataset based on a predefined split criterion. The node split in the growth process depends only on the class ratio of the data chunk that gets split in every internal node of the tree. In a classification tree learning task, when the class ratio of the unlabeled part of the dataset is available, it becomes feasible to use the unlabeled data alongside the labeled data to train the tree in a semi-supervised style. Our motivation is to facilitate the usage of the abundantly available unlabeled data for building classification trees, as it is laborious and expensive to acquire labels. In this paper, we propose a semi-supervised approach to growing classification trees, where we adapted the Maximum Mean Discrepancy (MMD) method for estimating the class ratio at every node split. In our experimentation using several binary and multiclass classification datasets, we observed that our semi-supervised approach to growing a classification tree is statistically better than traditional decision tree algorithms in 31 of 40 datasets.
引用
收藏
页码:29 / 37
页数:9
相关论文
共 28 条
  • [1] [Anonymous], 2018, InternationalConferenceonArtificialIntelligenceandStatistics, AISTATS2018, V84, P77
  • [2] Berlinet Alain, 2004, Reproducing Kernel Hilbert Spaces in Probability and Statistics, P1, DOI [10.1007/978-1-4419-9096-9_1, DOI 10.1007/978-1-4419-9096-9_1]
  • [3] Blockeel Hendrik., 2000, Proceedings of the 15th International Conference, P55
  • [4] Semi-supervised learning of class balance under class-prior change by distribution matching
    du Plessis, Marthinus Christoffel
    Sugiyama, Masashi
    [J]. NEURAL NETWORKS, 2014, 50 : 110 - 119
  • [5] Dua D, 2017, UCI MACHINE LEARNING, DOI DOI 10.1016/J.DSS.2009.05.016
  • [6] Explaining Explanations: An Overview of Interpretability of Machine Learning
    Gilpin, Leilani H.
    Bau, David
    Yuan, Ben Z.
    Bajwa, Ayesha
    Specter, Michael
    Kagal, Lalana
    [J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 80 - 89
  • [7] Gretton A., 2008, P ADV NEUR INF PROC, V1, P1
  • [8] Heifeng Li., 2019, SMILE STAT MACHINE I
  • [9] Kernel methods in machine learning
    Hofmann, Thomas
    Schoelkopf, Bernhard
    Smola, Alexander J.
    [J]. ANNALS OF STATISTICS, 2008, 36 (03) : 1171 - 1220
  • [10] Iyer A, 2014, PR MACH LEARN RES, V32