Meta-distribution-based ensemble sampler for imbalanced semi-supervised learning

被引:0
作者
Ning, Zhihan [1 ]
Guo, Chaoxun [1 ]
Zhang, David [2 ]
机构
[1] Chinese Univ Hong Kong, Sch Sci & Engn, Shenzhen 518172, Guangdong, Peoples R China
[2] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Guangdong, Peoples R China
关键词
Semi-supervised learning; Imbalanced data; Ensemble learning; Data resampling; Histogram discretization;
D O I
10.1016/j.patcog.2025.111552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised learning (SSL) on imbalanced data is largely under-explored and suffers from erroneous pseudo-labels, biased model training, or intolerable training costs. To alleviate these issues, we propose a meta-distribution-based ensemble sampler (MDSampler) approach1 for imbalanced SSL. MDSampler is a unified framework that integrates SSL, imbalanced learning, and ensemble learning via iterative instance under-sampling and cascade classifier aggregation. Specifically, MDSampler considers the confidence-diversity distribution of both labeled and unlabeled samples and obtains the so-called meta-distribution via 2-D histogram discretization. Sampling on the meta-distribution (1) assigns pseudo-labels to unlabeled data for SSL, (2) alleviates class imbalance since the sampling process is unbiased, (3) improves the diversity of the ensemble learning framework, and (4) is highly efficient and flexible. Additionally, an adaptive instance interpolation strategy is presented to improve the quality of pseudo-labeled samples. Extensive experiments show that MDSampler can be organically combined with various classifiers to achieve superior performance in imbalanced SSL.
引用
收藏
页数:13
相关论文
共 52 条
[1]   Machine learning-based heart disease diagnosis: A systematic literature review [J].
Ahsan, Md Manjurul ;
Siddique, Zahed .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 128
[2]  
Asuncion A., 2007, UCI Machine Learning Repository
[3]   Dynamic selection of classifiers-A comprehensive review [J].
Britto, Alceu S., Jr. ;
Sabourin, Robert ;
Oliveira, Luiz E. S. .
PATTERN RECOGNITION, 2014, 47 (11) :3665-3680
[4]  
Caruana R., 2004, SIGKDD Explorations Newsletter, V6, P95, DOI 10.1145/1046456.1046470
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[8]   Incremental learning for transductive support vector machine [J].
Chen, Haiyan ;
Yu, Ying ;
Jia, Yizhen ;
Gu, Bin .
PATTERN RECOGNITION, 2023, 133
[9]   Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy [J].
Dal Pozzolo, Andrea ;
Boracchi, Giacomo ;
Caelen, Olivier ;
Alippi, Cesare ;
Bontempi, Gianluca .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) :3784-3797
[10]   A simple graph-based semi-supervised learning approach for imbalanced classification [J].
Deng, Jianjin ;
Yu, Jin-Gang .
PATTERN RECOGNITION, 2021, 118