A semi-supervised approach to growing classification trees

被引:0
作者
Santhiappan, Sudarsun [1 ]
Ravindran, Balaraman [2 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] IIT Madras, Dept Comp Sci & Engn, Robert Bosch Ctr Data Sci & AI RBC DSAI, Chennai, Tamil Nadu, India
来源
CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD) | 2021年
关键词
Classification trees; Maximum mean discrepancy; Class ratio estimation; Semi-supervised methods;
D O I
10.1145/3430984.3431009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A classification tree is grown by repeated partitioning of the dataset based on a predefined split criterion. The node split in the growth process depends only on the class ratio of the data chunk that gets split in every internal node of the tree. In a classification tree learning task, when the class ratio of the unlabeled part of the dataset is available, it becomes feasible to use the unlabeled data alongside the labeled data to train the tree in a semi-supervised style. Our motivation is to facilitate the usage of the abundantly available unlabeled data for building classification trees, as it is laborious and expensive to acquire labels. In this paper, we propose a semi-supervised approach to growing classification trees, where we adapted the Maximum Mean Discrepancy (MMD) method for estimating the class ratio at every node split. In our experimentation using several binary and multiclass classification datasets, we observed that our semi-supervised approach to growing a classification tree is statistically better than traditional decision tree algorithms in 31 of 40 datasets.
引用
收藏
页码:29 / 37
页数:9
相关论文
共 28 条
  • [21] Adjusting the outputs of a classifier to new a priori probabilities:: A simple procedure
    Saerens, M
    Latinne, P
    Decaestecker, C
    [J]. NEURAL COMPUTATION, 2002, 14 (01) : 21 - 41
  • [22] EQUIVALENCE OF DISTANCE-BASED AND RKHS-BASED STATISTICS IN HYPOTHESIS TESTING
    Sejdinovic, Dino
    Sriperumbudur, Bharath
    Gretton, Arthur
    Fukumizu, Kenji
    [J]. ANNALS OF STATISTICS, 2013, 41 (05) : 2263 - 2291
  • [23] Sriperumbudur BK, 2010, J MACH LEARN RES, V11, P1517
  • [24] Semi-supervised self-training for decision tree classifiers
    Tanha, Jafar
    van Someren, Maarten
    Afsarmanesh, Hamideh
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (01) : 355 - 370
  • [25] Utgoff P. E., 1989, Machine Learning, V4, P161, DOI 10.1023/A:1022699900025
  • [26] Zhu X., 2009, SYNTHESIS LECT ARTIF, V3, P1, DOI [DOI 10.2200/S00196ED1V01Y200906AIM006, 10.1007/978-3-031-01548-9, 10.2200/S00196ED1V01Y200906AIM006]
  • [27] Zhu Xiaofeng, 2017, Med Image Comput Comput Assist Interv, V10435, P72, DOI 10.1007/978-3-319-66179-7_9
  • [28] Zurn Jane Brooks, 2008, SSR SEMISUPERVISED R, P1