Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection

被引:0
|
作者
Han, Yujin [1 ,4 ]
Xu, Mingwenchan [2 ,4 ]
Guan, Leying [3 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Northwestern Univ, Dept IEMS, Evanston, IL USA
[3] Yale Univ, Dept Biostat, New Haven, CT 06520 USA
[4] Yale Univ, New Haven, CT USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
PREDICTIVE INFERENCE; COVARIATE SHIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Random Forests classifier, a widely utilized o.-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction C(x). Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Semi-Supervised Classification of the State of Operation in Self-Lubricating Journal Bearings Using a Random Forest Classifier
    Prost, Josef
    Cihak-Bayr, Ulrike
    Neacsu, Ioana Adina
    Grundtner, Reinhard
    Pirker, Franz
    Vorlaufer, Georg
    LUBRICANTS, 2021, 9 (05)
  • [42] Semi-supervised novelty detection
    Blanchard, Gilles
    Lee, Gyemin
    Scott, Clayton
    Journal of Machine Learning Research, 2010, 11 : 2973 - 3009
  • [43] Semi-Supervised Novelty Detection
    Blanchard, Gilles
    Lee, Gyemin
    Scott, Clayton
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 2973 - 3009
  • [44] Semi-Supervised Learning for ECG Classification
    Rodrigues, Rui
    Couto, Paula
    2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [45] Semi-Supervised Network Traffic Classification
    Erman, Jeffrey
    Mahanti, Anirban
    Arlitt, Martin
    Cohen, Ira
    Williamson, Carey
    SIGMETRICS'07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON MEASUREMENT & MODELING OF COMPUTER SYSTEMS, 2007, 35 (01): : 369 - 370
  • [46] Augmentation Learning for Semi-Supervised Classification
    Frommknecht, Tim
    Zipf, Pedro Alves
    Fan, Quanfu
    Shvetsova, Nina
    Kuehne, Hilde
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98
  • [47] Semi-supervised classification with privileged information
    Qi, Zhiquan
    Tian, Yingjie
    Niu, Lingfeng
    Wang, Bo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (04) : 667 - 676
  • [48] Semi-supervised classification using bridging
    Chan, Jason
    Koprinska, Irena
    Poon, Josiah
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2008, 17 (03) : 415 - 431
  • [49] Inductive semi-supervised universum classification
    Wang, Yunyun, 1600, Binary Information Press (10):
  • [50] Semi-supervised classification by discriminative regularization
    Wang, Jun
    Yao, Guangjun
    Yu, Guoxian
    APPLIED SOFT COMPUTING, 2017, 58 : 245 - 255