Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection

被引：0

作者：

Han, Yujin ^{[1
,4
]}

Xu, Mingwenchan ^{[2
,4
]}

Guan, Leying ^{[3
]}

机构：

[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] Northwestern Univ, Dept IEMS, Evanston, IL USA

[3] Yale Univ, Dept Biostat, New Haven, CT 06520 USA

[4] Yale Univ, New Haven, CT USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

PREDICTIVE INFERENCE; COVARIATE SHIFT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Random Forests classifier, a widely utilized o.-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction C(x). Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest

引用

页数：22

共 50 条

[41] Semi-Supervised Classification of the State of Operation in Self-Lubricating Journal Bearings Using a Random Forest Classifier
Prost, Josef
Cihak-Bayr, Ulrike
Neacsu, Ioana Adina
Grundtner, Reinhard
Pirker, Franz
Vorlaufer, Georg
LUBRICANTS, 2021, 9 (05)
[42] Semi-supervised novelty detection
Blanchard, Gilles
Lee, Gyemin
Scott, Clayton
Journal of Machine Learning Research, 2010, 11 : 2973 - 3009
[43] Semi-Supervised Novelty Detection
Blanchard, Gilles
Lee, Gyemin
Scott, Clayton
JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 2973 - 3009
[44] Semi-Supervised Learning for ECG Classification
Rodrigues, Rui
Couto, Paula
2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
[45] Semi-Supervised Network Traffic Classification
Erman, Jeffrey
Mahanti, Anirban
Arlitt, Martin
Cohen, Ira
Williamson, Carey
SIGMETRICS'07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON MEASUREMENT & MODELING OF COMPUTER SYSTEMS, 2007, 35 (01): : 369 - 370
[46] Augmentation Learning for Semi-Supervised Classification
Frommknecht, Tim
Zipf, Pedro Alves
Fan, Quanfu
Shvetsova, Nina
Kuehne, Hilde
PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98
[47] Semi-supervised classification with privileged information
Qi, Zhiquan
Tian, Yingjie
Niu, Lingfeng
Wang, Bo
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (04) : 667 - 676
[48] Semi-supervised classification using bridging
Chan, Jason
Koprinska, Irena
Poon, Josiah
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2008, 17 (03) : 415 - 431
[49] Inductive semi-supervised universum classification
Wang, Yunyun, 1600, Binary Information Press (10):
[50] Semi-supervised classification by discriminative regularization
Wang, Jun
Yao, Guangjun
Yu, Guoxian
APPLIED SOFT COMPUTING, 2017, 58 : 245 - 255

← 1 2 3 4 5 →